FewerJobs.
All jobs

AI Hardware Systems Engineer, Annapurna Labs, Trainium Machine Learning Fleet Operations

Amazon - Austin, Texas, USA

Posted Mar 19, 2026

Benefits

Parental leave
Not verified not verified - source not recorded; timestamp not recorded
Non-birth-parent leave
Not verified not verified - source not recorded; timestamp not recorded
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified
Salary
Not verified not verified - source not recorded; timestamp not recorded
401(k) match
Not verified

Was this benefit information wrong? Tell us.

Schedule

Shift type
Not verified
Weekend work
Not verified

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

AI Hardware Systems Engineer, Annapurna Labs, Trainium Machine Learning Fleet Operations Austin, Texas, USA Annapurna Labs designs silicon and software that accelerates innovation. Customers choose us to create cloud solutions that solve challenges that were unimaginable a short time ago-even yesterday. Our custom chips, accelerators, and software stacks enable us to take on technical challenges that have never been seen before, and deliver results that help our customers change the world. In Annapurna Labs we are at the forefront of hardware/software co-design not just in Amazon Web Services (AWS) but across the industry. The Machine Learning Acceleration Fleet Operations Team is looking for candidates interested in diving deep into our fleet of ML servers deployed around the world. We are seeking an engineer who is comfortable debugging emergent problems in GPU and server hardware, writing scripts in languages such as Python or Bash, running large scale experiments on a fleet of complex hardware, developing data infrastructure and analyzing trends, and developing automation software to scale operations. Our team has end to end ownership of some of the most advanced server hardware in the world. We drive technical debug efforts and write truly massive scale autonomous software to monitor, optimize, and remediate machine learning hardware. Come join us! Key job responsibilities - Member of a team responsible for system remediation, operational excellence, and customer experience on bleeding edge ML products - Utilize data to root cause hardware failures and identify live trends on the most complex systems in AWS - Implement

Read the full description at www.amazon.jobs. FewerJobs shows a source-linked preview and links to the original posting.

Apply at amazon.jobs

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs