Engineer, Supercomputing & Distributed Systems
Krea - San Francisco, California, United States
Posted Apr 3, 2026
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Engineer, Supercomputing & Distributed Systems San Francisco, California, United States About Krea At Krea, we are building next-generation AI creative tools. We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it. We believe AI is a new medium that allows us to express ourselves through various formats-text, images, video, sound, and even 3D. We're building better, smarter, and more controllable tools to harness this medium. Supercomputing / AI Infra at Krea We build and operate the infrastructure for Krea's research and inference. Distributed training, 1000+ K8s GPU clusters, petabyte scale data pipelines, etc. We build a lot of this from scratch - custom distributed datastores, job orchestration systems, and streaming pipelines that replace tools like Kafka and Ray for modern AI workloads at scale. Example projects: Distributed data systems - Design multi-stage pipelines that turn petabytes of raw data into clean, annotated datasets - Run classification models on billions of images - Deploy and combine LLMs to caption massive multimedia data GPU infrastructure - Manage distributed training and inference on 1000+ GPU Kubernetes clusters - Solve orchestration and scaling for large-scale GPU job processing - Scale workloads and research between clusters in multiple datacenters Distributed training - Profile and optimize dataloaders streaming thousands of images per second - Profile and debug InfiniBand networking on huge training runs - Build fault tolerance systems for large-scale pretraining - Collaborate with researchers on evolving RL infrastructure Applied ML pipelines
Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Systems Engineer - (Execution) - Level 3/4
Northrop Grumman - United States-Alabama-Huntsville
-
Business Analyst (Top Secret cleared)
ICF International INC - Washington, DC
-
Engineering Project Specialist II (Full Time) - United State
Cisco - San Jose, California, US
-
Automation AI Ops Engineer
Cisco - 2 Locations