FewerJobs.
All jobs

Senior Site Reliability Engineer - Observability

Lambda - San Francisco Office (Fremont St), San Francisco, California, United States, San Jose Office (First St), San Jose Office (Zanker), Bellevue, WA

Posted May 8, 2026

Benefits

Parental leave
Not verified
Non-birth-parent leave
Not verified
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified
Salary
Not verified
401(k) match
Not verified not verified - source URL not recorded; timestamp not recorded

Was this benefit information wrong? Tell us.

Schedule

Shift type
Not verified
Weekend work
Not verified

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

Senior Site Reliability Engineer - Observability San Francisco Office (Fremont St), San Francisco, California, United States, San Jose Office (First St), San Jose Office (Zanker), Bellevue, WA Lambda, The Superintelligence Cloud, is a leader in AI cloud infrastructure serving tens of thousands of customers. Our customers range from AI researchers to enterprises and hyperscalers. Lambda's mission is to make compute as ubiquitous as electricity and give everyone the power of superintelligence. One person, one GPU. If you'd like to build the world's best AI cloud, join us. *Note: This position requires presence in our San Francisco, San Jose, or Bellevue WA office location 4 days per week; Lambda's designated work from home day is currently Tuesday. Engineering at Lambda is responsible for building and scaling our cloud offering. Our scope includes the Lambda website, cloud APIs and systems as well as internal tooling for system deployment, management and maintenance. What You'll Do - Deploy and operate observability platforms for logging, metrics, and distributed tracing. - Automate the deployment and operation of these observability systems. - Set up monitoring for modern AI/HPC cluster infrastructure. - Develop platform software to make observability adoptable and improve product reliability. - Lead members of other engineering teams in development of solutions for their monitoring challenges. You - Have 8+ years of experience in software engineering, with 3+ years in Go - Have 5+ years of experience in Site Reliability Engineering practices - Possess proven understanding of Observability tools and practices - Have experience with application

Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs