FewerJobs.
All jobs

Member of Technical Staff - GPU Performance Engineer

Liquid AI - San Francisco, United States, Boston, Remote

Posted Jul 29, 2025

Benefits

Parental leave
Not verified
Non-birth-parent leave
Not verified
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified
Salary
Not verified
401(k) match
Not verified not verified - source URL not recorded; timestamp not recorded

Was this benefit information wrong? Tell us.

Market context

U.S. role benchmark (BLS OEWS)
$111,944 U.S. median for this role
Projected growth (BLS Employment Projections)
+13.7% - Much faster than average

Matched to SOC 15-1252 - Data and ML aggregate by role bucket.

Source: U.S. Bureau of Labor Statistics, OEWS, May 2024 and Employment Projections, 2024-2034.

Schedule

Shift type
Not verified
Weekend work
Not verified

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

Member of Technical Staff - GPU Performance Engineer San Francisco, United States, Boston, Remote About Liquid AI Spun out of MIT CSAIL, we build general-purpose AI systems that run efficiently across deployment targets, from data center accelerators to on-device hardware, ensuring low latency, minimal memory usage, privacy, and reliability. We partner with enterprises across consumer electronics, automotive, life sciences, and financial services. We are scaling rapidly and need exceptional people to help us get there. The Opportunity Our models and workflows require performance work that generic frameworks don't solve. You'll design and ship custom CUDA kernels, profile at the hardware level, and integrate research ideas into production code that delivers measurable speedups in real pipelines (training, post-training, and inference). Our team is small, fast-moving, and high-ownership. We're looking for someone who finds joy in memory hierarchies, tensor cores, and profiler output. While San Francisco and Boston are preferred, we are open to other locations. What We're Looking For We need someone who: - Works profiler-first: You use tools like Nsight Systems / Nsight Compute to find bottlenecks, validate hypotheses, and iterate until improvements show up in end-to-end benchmarks. - Bridges theory and practice: You can translate ideas from papers into implementations that are robust, testable, and performant. - Executes independently: Given an ambiguous bottleneck, you can drive from profiling to kernel/integration changes to benchmarked results to maintained ownership. - Cares about the details: Memory hierarchy, occupancy, launch configs, tensor core utilization, bandwidth vs compute limits. The Work - Write high-performance

Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs