Software Engineer I - AI/ML, AWS Neuron Distributed Training
Amazon - Cupertino, California, USA
Posted Jun 2, 2026
Benefits
- Parental leave
- Not verified not verified - source not recorded; timestamp not recorded
- Non-birth-parent leave
- Not verified not verified - source not recorded; timestamp not recorded
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- $127K-$185K not verified - source not recorded; timestamp not recorded
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Market context
- U.S. role benchmark (BLS OEWS)
- $111,944 U.S. median for this role
- Projected growth (BLS Employment Projections)
- +13.7% - Much faster than average
39% above the BLS role benchmark for data and ml aggregate.
Matched to SOC 15-1252 - Data and ML aggregate by role bucket.
Source: U.S. Bureau of Labor Statistics, OEWS, May 2024 and Employment Projections, 2024-2034.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Software Engineer I - AI/ML, AWS Neuron Distributed Training Cupertino, California, USA Annapurna Labs designs silicon and software that accelerates innovation. Our custom chips, accelerators, and software stacks enable us to tackle unprecedented technical challenges and deliver solutions that help customers change the world. AWS Neuron is the complete software stack powering AWS Trainium (Trn2/Trn3), our cloud scale Machine Learning accelerators and we are seeking a Senior Software Engineer to join our ML Distributed Training team. In this role, you will be responsible for the development, enablement, and performance optimization of large scale ML model training across diverse model families. This includes massive scale pre-training and post-training of LLMs with Dense and Mixture-of-Experts architectures, Multimodal models that are transformer and diffusion based, and Reinforcement Learning workloads. You will work at the intersection of ML research and high performance systems, collaborating closely with chip architects, compiler engineers, runtime engineers and AWS solution architects to deliver cost-effective, performant machine learning solutions on AWS Trainium based systems. Key job responsibilities You will contribute to the design and implementation of distributed training solutions for large-scale ML models running on Trainium instances. A significant part of your work will involve extending and optimizing popular distributed training frameworks including FSDP, torchtitan, and Hugging Face libraries for the Neuron ecosystem. A core focus of this role involves developing and optimizing mixed-precision and low-precision training techniques. You will work with BF16, FP8, and emerging numerical formats to improve training throughput while maintaining model accuracy and convergence quality. This
Read the full description at www.amazon.jobs. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Systems Engineer - (Execution) - Level 3/4
Northrop Grumman - United States-Alabama-Huntsville
-
Business Analyst (Top Secret cleared)
ICF International INC - Washington, DC
-
Engineering Project Specialist II (Full Time) - United State
Cisco - San Jose, California, US
-
Electrical Engineer 2
Northrop Grumman - United States-Alabama-Marshall Space Flight Center