FewerJobs.
All jobs

Machine Learning Infrastructure Engineer

Mind Robotics - Palo Alto, California, United States

Posted Jan 26, 2026

Benefits

Parental leave
Not verified
Non-birth-parent leave
Not verified
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified
Salary
Not verified
401(k) match
Not verified

Was this benefit information wrong? Tell us.

Schedule

Shift type
Not verified
Weekend work
Not verified

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

Machine Learning Infrastructure Engineer Palo Alto, California, United States The Role At Mind Robotics, we're building generalized physical AI -robotic systems capable of dexterous, adaptive, and reasoning-intensive work in real-world industrial environments. Our ability to iterate quickly on large-scale models depends on world-class ML infrastructure. We're looking for a Machine Learning Infrastructure Engineer to build the core systems that enable fast, reliable, and scalable model training-powering everything from experimentation to production deployment. Responsibilities - Design and implement scalable systems for training large ML models - Enable efficient workflows for data ingestion, training, and iteration - Develop and optimize distributed training systems across hundreds of GPUs - Implement strategies for parallelization, sharding, and efficient compute utilization - Improve training efficiency through techniques such as attention optimizations, kernel fusion, and memory management - Partner closely with modeling teams to accelerate iteration speed and reduce training costs - Build internal tools for experiment tracking, monitoring, and debugging - Implement systems for tracking training performance, failures, and resource utilization - Debug and resolve bottlenecks across the training stack - Provide lightweight infrastructure support for deploying and running models in production environments - Optimize inference performance and reliability where needed - Support core cloud infrastructure needs for training workloads (without heavy DevOps overhead) - Manage compute resources efficiently across training jobs Qualifications - Strong experience building infrastructure for large-scale ML training - Deep understanding of how modern LLM/VLM systems are trained and scaled - Proven experience setting up and scaling distributed training across hundreds of

Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs