Software Development Engineer AI/ML, Inference Serving, AWS Neuron

Amazon - Cupertino, California, USA

Posted Sep 19, 2025

Benefits

Parental leave: Not verified not verified - source not recorded; timestamp not recorded
Non-birth-parent leave: Not verified not verified - source not recorded; timestamp not recorded
Family-building benefits: Fertility benefits: Not verified
Adoption assistance: Not verified
Surrogacy assistance: Not verified
Mental health support: Not verified
Relocation assistance: Not verified
Childcare support: Not verified
Learning budget: Not verified
Verification: Not verified
Salary: Not verified not verified - source not recorded; timestamp not recorded
401(k) match: Not verified

Was this benefit information wrong? Tell us.

Schedule

Shift type: Not verified
Weekend work: Not verified

Application

Cover letter: Not verified
Assessment: Not verified
Deadline: Not stated

Where they hire

State eligibility is not yet verified.

About this role

Software Development Engineer AI/ML, Inference Serving, AWS Neuron Cupertino, California, USA AWS Neuron is the software stack powering AWS Inferentia and Trainium machine learning accelerators, designed to deliver high-performance, low-cost inference at scale. The Neuron Serving team develops infrastructure to serve modern machine learning models-including large language models (LLMs) and multimodal workloads-reliably and efficiently on AWS silicon. We are seeking a Software Development Engineer to lead and architect our next-generation model serving infrastructure, with a particular focus on large-scale generative AI applications. Key job responsibilities * Architect and lead the design of distributed ML serving systems optimized for generative AI workloads * Drive technical excellence in performance optimization and system reliability across the Neuron ecosystem * Design and implement scalable solutions for both offline and online inference workloads * Lead integration efforts with frameworks such as vLLM, SGLang, Torch XLA, TensorRT, and Triton * Develop and optimize system components for tensor/data parallelism and disaggregated serving * Implement and optimize custom PyTorch operators and NKI kernels * Mentor team members and provide technical leadership across multiple work streams * Drive architectural decisions that impact the entire Neuron serving stack * Collaborate with customers, product owners, and engineering teams to define technical strategy * Author technical documentation, design proposals, and architectural guidelines A day in the life You'll lead critical technical initiatives while mentoring team members. You'll collaborate with cross-functional teams of applied scientists, system engineers, and product managers to architect and deliver state-of-the-art inference capabilities. Your day might involve: * Leading

Read the full description at www.amazon.jobs. FewerJobs shows a source-linked preview and links to the original posting.

Apply at amazon.jobs

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs

Systems Engineer - (Execution) - Level 3/4

Northrop Grumman - United States-Alabama-Huntsville
Business Analyst (Top Secret cleared)

ICF International INC - Washington, DC
Engineering Project Specialist II (Full Time) - United State

Cisco - San Jose, California, US
Automation AI Ops Engineer

Cisco - 2 Locations

Software Development Engineer AI/ML, Inference Serving, AWS Neuron

Benefits

Schedule

Application

Where they hire

About this role

What verified means

Related jobs

Systems Engineer - (Execution) - Level 3/4

Business Analyst (Top Secret cleared)

Engineering Project Specialist II (Full Time) - United State

Automation AI Ops Engineer