Member of Technical Staff, Kernel Engineering
Inferact - San Francisco, California, United States
Posted Jan 22, 2026
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified not verified - source not recorded; timestamp not recorded
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Member of Technical Staff, Kernel Engineering San Francisco, California, United States Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware-a position that took years to build. About the Role We're looking for a performance engineer to squeeze every FLOP out of modern accelerators. You'll write the kernels and low-level optimizations that make vLLM the fastest inference engine in the world. Your code will run on hundreds of accelerator types, from NVIDIA GPUs to emerging silicon. When hardware vendors develop new chips, they integrate with vLLM. You'll work directly with these teams to ensure we're extracting maximum performance from every generation of hardware. Skills and Qualifications Minimum qualifications: - Bachelor's degree or equivalent experience in computer science, engineering, or similar. - Deep experience writing CUDA kernels or equivalent (CuTeDSL, Triton, TileLang, Pallas). - Strong understanding of GPU architecture: memory hierarchy, warp scheduling, tiling, tensor cores. - Proficiency in C++ and Python with demonstrated ability to write high-performance code. - Experience with profiling tools (Nsight, rocprof) and performance optimization methodologies. - Obsession with benchmarks and squeezing every percentage point of speedup. Preferred qualifications: - Experience with ML-specific kernel optimization (FlashAttention, fused kernels). - Knowledge of quantization techniques (INT8, FP8, mixed-precision). - Familiarity with multiple accelerator platforms (NVIDIA, AMD, TPU, Intel). - Experience with compiler technologies (LLVM, MLIR, XLA). Bonus points if
Read the full description at jobs.ashbyhq.com. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Systems Engineer - (Execution) - Level 3/4
Northrop Grumman - United States-Alabama-Huntsville
-
Business Analyst (Top Secret cleared)
ICF International INC - Washington, DC
-
Engineering Project Specialist II (Full Time) - United State
Cisco - San Jose, California, US
-
Automation AI Ops Engineer
Cisco - 2 Locations