Machine Learning Infrastructure Engineer

Mind Robotics - Palo Alto, California, United States

Posted Jan 26, 2026

Benefits

Parental leave: Not verified
Non-birth-parent leave: Not verified
Family-building benefits: Fertility benefits: Not verified
Adoption assistance: Not verified
Surrogacy assistance: Not verified
Mental health support: Not verified
Relocation assistance: Not verified
Childcare support: Not verified
Learning budget: Not verified
Verification: Not verified
Salary: Not verified
401(k) match: Not verified

Was this benefit information wrong? Tell us.

Schedule

Shift type: Not verified
Weekend work: Not verified

Application

Cover letter: Not verified
Assessment: Not verified
Deadline: Not stated

Where they hire

State eligibility is not yet verified.

About this role

Machine Learning Infrastructure Engineer Palo Alto, California, United States The Role At Mind Robotics, we're building generalized physical AI -robotic systems capable of dexterous, adaptive, and reasoning-intensive work in real-world industrial environments. Our ability to iterate quickly on large-scale models depends on world-class ML infrastructure. We're looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable, and scalable model training-powering everything from experimentation to production deployment. Responsibilities - Design and implement scalable systems for training large ML models - Enable efficient workflows for data ingestion, training, and iteration - Develop and optimize distributed training systems across hundreds of GPUs - Implement strategies for parallelization, sharding, and efficient compute utilization - Improve training efficiency through techniques such as attention optimizations, kernel fusion, and memory management - Partner closely with modeling teams to accelerate iteration speed and reduce training costs - Build internal tools for experiment tracking, monitoring, and debugging - Implement systems for tracking training performance, failures, and resource utilization - Debug and resolve bottlenecks across the training stack - Provide lightweight infrastructure support for deploying and running models in production environments - Optimize inference performance and reliability where needed - Support core cloud infrastructure needs for training workloads (without heavy DevOps overhead) - Manage compute resources efficiently across training jobs Qualifications - Strong experience building infrastructure for large-scale ML training - Deep understanding of how modern LLM/VLM systems are trained and scaled - Proven experience setting up and scaling distributed training across hundreds of

Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs

Systems Engineer - (Execution) - Level 3/4

Northrop Grumman - United States-Alabama-Huntsville
Business Analyst (Top Secret cleared)

ICF International INC - Washington, DC
Engineering Project Specialist II (Full Time) - United State

Cisco - San Jose, California, US
Automation AI Ops Engineer

Cisco - 2 Locations

Machine Learning Infrastructure Engineer

Benefits

Schedule

Application

Where they hire

About this role

What verified means

Related jobs

Systems Engineer - (Execution) - Level 3/4

Business Analyst (Top Secret cleared)

Engineering Project Specialist II (Full Time) - United State

Automation AI Ops Engineer