Multimodal Generative AI Researcher
Stability AI - Remote
Posted Jan 30, 2026
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Multimodal Generative AI Researcher Remote Multimodal Generative AI Researcher Location: Remote About the Role We're looking for a Research Scientist with deep expertise in training and fine-tuning large Vision-Language and Language Models (VLMs / LLMs) for downstream multimodal tasks. You'll help push the next frontier of models that reason across vision, language, and 3D , bridging research breakthroughs with scalable engineering. What You'll Do - Design and fine-tune large-scale VLMs / LLMs - and hybrid architectures - for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction. - Build robust, efficient training and evaluation pipelines (data curation, distributed training, mixed precision, scalable fine-tuning). - Conduct in-depth analysis of model performance: ablations, bias / robustness checks, and generalisation studies. - Collaborate across research, engineering, and 3D / graphics teams to bring models from prototype to production. - Publish impactful research and help establish best practices for multimodal model adaptation. What You Bring - PhD (or equivalent experience) in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics. - Proven track record in fine-tuning or training large-scale VLMs / LLMs for real-world downstream tasks. - Strong engineering mindset - you can design, debug, and scale training systems end-to-end. - Deep understanding of multimodal alignment and representation learning (vision-language fusion, CLIP-style pre-training, retrieval-augmented generation). - Familiarity with recent trends, including video-language and long-context VLMs , spatio-temporal grounding , agentic multimodal reasoning , and Mixture-of-Experts (MoE) fine-tuning. - Awareness of 3D-aware multimodal models - using NeRFs, Gaussian splatting, or differentiable renderers for
Read the full description at stability.ai. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Security Coordinator 4 (12675-1. 15471-1. 13771-1)
Northrop Grumman - United States-Utah-Roy
-
Loan Servicing Representative
AXOS Financial INC - Las Vegas, NV
-
Staff Test Conductor
Northrop Grumman - United States-California-Palmdale
-
Off Premise Specialist
Constellation Brands - 2 Locations