Engineer, Supercomputing & Distributed Systems

Krea - San Francisco, California, United States

Posted Apr 3, 2026

Benefits

Parental leave: Not verified
Non-birth-parent leave: Not verified
Family-building benefits: Fertility benefits: Not verified
Adoption assistance: Not verified
Surrogacy assistance: Not verified
Mental health support: Not verified
Relocation assistance: Not verified
Childcare support: Not verified
Learning budget: Not verified
Verification: Not verified
Salary: Not verified
401(k) match: Not verified

Was this benefit information wrong? Tell us.

Schedule

Shift type: Not verified
Weekend work: Not verified

Application

Cover letter: Not verified
Assessment: Not verified
Deadline: Not stated

Where they hire

State eligibility is not yet verified.

About this role

Engineer, Supercomputing & Distributed Systems San Francisco, California, United States About Krea At Krea, we are building next-generation AI creative tools. We are dedicated to making AI intuitive and controllable for creatives. Our mission is to build tools that empower human creativity, not replace it. We believe AI is a new medium that allows us to express ourselves through various formats-text, images, video, sound, and even 3D. We're building better, smarter, and more controllable tools to harness this medium. Supercomputing / AI Infra at Krea We build and operate the infrastructure for Krea's research and inference. Distributed training, 1000+ K8s GPU clusters, petabyte scale data pipelines, etc. We build a lot of this from scratch - custom distributed datastores, job orchestration systems, and streaming pipelines that replace tools like Kafka and Ray for modern AI workloads at scale. Example projects: Distributed data systems - Design multi-stage pipelines that turn petabytes of raw data into clean, annotated datasets - Run classification models on billions of images - Deploy and combine LLMs to caption massive multimedia data GPU infrastructure - Manage distributed training and inference on 1000+ GPU Kubernetes clusters - Solve orchestration and scaling for large-scale GPU job processing - Scale workloads and research between clusters in multiple datacenters Distributed training - Profile and optimize dataloaders streaming thousands of images per second - Profile and debug InfiniBand networking on huge training runs - Build fault tolerance systems for large-scale pretraining - Collaborate with researchers on evolving RL infrastructure Applied ML pipelines

Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs

Systems Engineer - (Execution) - Level 3/4

Northrop Grumman - United States-Alabama-Huntsville
Business Analyst (Top Secret cleared)

ICF International INC - Washington, DC
Engineering Project Specialist II (Full Time) - United State

Cisco - San Jose, California, US
Automation AI Ops Engineer

Cisco - 2 Locations

Engineer, Supercomputing & Distributed Systems

Benefits

Schedule

Application

Where they hire

About this role

What verified means

Related jobs

Systems Engineer - (Execution) - Level 3/4

Business Analyst (Top Secret cleared)

Engineering Project Specialist II (Full Time) - United State

Automation AI Ops Engineer