Senior AI Inference Engineer - Model Optimization & Deployment
Zoox - Foster City, CA
Posted Apr 11, 2026
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Senior AI Inference Engineer - Model Optimization & Deployment Foster City, CA The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices. The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Model Optimization & Deployment Engineer, you will focus on bringing highly efficient, production-ready large-scale models to our on-vehicle stack. We are looking for experts with hands-on experience in compressing, accelerating, and deploying complex models (LLMs, VLMs, or FMs) for power- and thermal-constrained vehicle SOCs. You will optimize the ML models, write custom CUDA kernels, and build highly concurrent inference code to ensure real-time, deterministic execution on edge devices. In this role, you will: - Optimize large-scale models (Multi-Modal Sensor Fusion models, LLMs, VLMs) using advanced quantization (PTQ, QAT), pruning, mixed-precision inference frameworks, and parameter-efficient fine-tuning (LoRA, QLoRA). - Architect and implement model conversion and compilation pipelines using TensorRT for edge deployment. - Perform rigorous parity checking, accuracy recovery, and latency benchmarking between PyTorch frameworks
Read the full description at jobs.lever.co. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Mechanical Engineering Manager 2 - 16282
Northrop Grumman - United States-Utah-Roy
-
Senior Software Engineer, Simulation and Integration
Axcelis Technologies INC - Beverly, MA
-
Payload AI&T Lead Staff Systems Engineer
Northrop Grumman - United States-Maryland-Linthicum
-
Senior Software Engineer, Equipment Control
Axcelis Technologies INC - Beverly, MA