Machine Learning Infrastructure Engineer
Mind Robotics - Palo Alto, California, United States
Posted Jan 26, 2026
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Machine Learning Infrastructure Engineer Palo Alto, California, United States The Role At Mind Robotics, we're building generalized physical AI -robotic systems capable of dexterous, adaptive, and reasoning-intensive work in real-world industrial environments. Our ability to iterate quickly on large-scale models depends on world-class ML infrastructure. We're looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable, and scalable model training-powering everything from experimentation to production deployment. Responsibilities - Design and implement scalable systems for training large ML models - Enable efficient workflows for data ingestion, training, and iteration - Develop and optimize distributed training systems across hundreds of GPUs - Implement strategies for parallelization, sharding, and efficient compute utilization - Improve training efficiency through techniques such as attention optimizations, kernel fusion, and memory management - Partner closely with modeling teams to accelerate iteration speed and reduce training costs - Build internal tools for experiment tracking, monitoring, and debugging - Implement systems for tracking training performance, failures, and resource utilization - Debug and resolve bottlenecks across the training stack - Provide lightweight infrastructure support for deploying and running models in production environments - Optimize inference performance and reliability where needed - Support core cloud infrastructure needs for training workloads (without heavy DevOps overhead) - Manage compute resources efficiently across training jobs Qualifications - Strong experience building infrastructure for large-scale ML training - Deep understanding of how modern LLM/VLM systems are trained and scaled - Proven experience setting up and scaling distributed training across hundreds of
Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Systems Engineer - (Execution) - Level 3/4
Northrop Grumman - United States-Alabama-Huntsville
-
Business Analyst (Top Secret cleared)
ICF International INC - Washington, DC
-
Engineering Project Specialist II (Full Time) - United State
Cisco - San Jose, California, US
-
Automation AI Ops Engineer
Cisco - 2 Locations