FewerJobs.
All jobs

Software Engineer, Site Reliability

Fal - San Francisco

Posted Feb 23, 2026

Benefits

Parental leave
Not verified
Non-birth-parent leave
Not verified
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified
Salary
Not verified not verified - source not recorded; timestamp not recorded
401(k) match
Not verified

Was this benefit information wrong? Tell us.

Schedule

Shift type
Not verified
Weekend work
Not verified

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

Software Engineer, Site Reliability San Francisco fal is the generative media ecosystem powering the next generation of AI products. We build the infrastructure, tools, and model access that teams need to move from idea to production, and do it at scale without compromise. For developers and enterprises, fal is the foundation that makes generative media not just possible, but practical: a unified platform where high-performance inference, orchestration, and observability come together to unlock new categories of AI-native products. As generative media reshapes industries across a market projected to grow by hundreds of billions over the next decade, fal is becoming the ecosystem that ambitious teams build on. About this role You are a seasoned SRE who keeps production infrastructure running at scale. You own the reliability and availability of customer-facing systems - from Kubernetes clusters to deployment pipelines to the networking layer that connects it all. You think in SLOs, automate ruthlessly, and treat every incident as a chance to make the system better. Key Responsibilities - Own and operate our Kubernetes infrastructure: cluster lifecycle, upgrades, networking, and multi-tenant isolation for customer workloads - Build and maintain CI/CD pipelines and deployment infrastructure - Leverage AI to an extreme level to automate analysis and resolution of production issues, and improve software development speed, reliability and maintainability - Build dashboards, alerting, and anomaly detection across our systems - Define and enforce SLOs and build out incident response processes - Manage and improve our networking, load balancing, and service mesh configurations - Drive

Read the full description at job-boards.greenhouse.io. FewerJobs shows a source-linked preview and links to the original posting.

Apply at job-boards.greenhouse.io

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs