Software Development Engineer AI/ML, Inference Serving, AWS Neuron
Amazon - Cupertino, California, USA
Posted Sep 19, 2025
Benefits
- Parental leave
- Not verified not verified - source not recorded; timestamp not recorded
- Non-birth-parent leave
- Not verified not verified - source not recorded; timestamp not recorded
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified not verified - source not recorded; timestamp not recorded
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Software Development Engineer AI/ML, Inference Serving, AWS Neuron Cupertino, California, USA AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators, designed to deliver high-performance, low-cost inference at scale. The Neuron Serving team develops infrastructure to serve modern machine learning models-including large language models (LLMs) and multimodal workloads-reliably and efficiently on AWS silicon. We are seeking a Software Development Engineer to lead and architect our next-generation model serving infrastructure, with a particular focus on large-scale generative AI applications. Key job responsibilities * Architect and lead the design of distributed ML serving systems optimized for generative AI workloads * Drive technical excellence in performance optimization and system reliability across the Neuron ecosystem * Design and implement scalable solutions for both offline and online inference workloads * Lead integration efforts with frameworks such as vLLM, SGLang, Torch XLA, TensorRT, and Triton * Develop and optimize system components for tensor/data parallelism and disaggregated serving * Implement and optimize custom PyTorch operators and NKI kernels * Mentor team members and provide technical leadership across multiple work streams * Drive architectural decisions that impact the entire Neuron serving stack * Collaborate with customers, product owners, and engineering teams to define technical strategy * Author technical documentation, design proposals, and architectural guidelines A day in the life You'll lead critical technical initiatives while mentoring team members. You'll collaborate with cross-functional teams of applied scientists, system engineers, and product managers to architect and deliver state-of-the-art inference capabilities. Your day might involve: * Leading
Read the full description at www.amazon.jobs. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Systems Engineer - (Execution) - Level 3/4
Northrop Grumman - United States-Alabama-Huntsville
-
Business Analyst (Top Secret cleared)
ICF International INC - Washington, DC
-
Engineering Project Specialist II (Full Time) - United State
Cisco - San Jose, California, US
-
Automation AI Ops Engineer
Cisco - 2 Locations