Staff Software Engineer - GenAI Performance and Kernel
Databricks - San Francisco, California
Posted Oct 8, 2025
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified not verified - source not recorded; timestamp not recorded
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Staff Software Engineer - GenAI Performance and Kernel San Francisco, California P-1285 About This Role As a staff software engineer for GenAI Performance and Kernel, you will own the design, implementation, optimization, and correctness of the high-performance GPU kernels powering our GenAI inference stack. You will lead development of highly-tuned, low-level compute paths, manage trade-offs between hardware efficiency and generality, and mentor others in kernel-level performance engineering. You will work closely with ML researchers, systems engineers, and product teams to push the state-of-the-art in inference performance at scale. What You Will Do - Lead the design, implementation, benchmarking, and maintenance of core compute kernels (e.g. attention, MLP, softmax, layernorm, memory management) optimized for various hardware backends (GPU, accelerators) - Drive the performance roadmap for kernel-level improvements: vectorization, tensorization, tiling, fusion, mixed precision, sparsity, quantization, memory reuse, scheduling, auto-tuning, etc. - Integrate kernel optimizations with higher-level ML systems - Build and maintain profiling, instrumentation, and verification tooling to detect correctness, performance regressions, numerical issues, and hardware utilization gaps - Lead performance investigations and root-cause analysis on inference bottlenecks, e.g. memory bandwidth, cache contention, kernel launch overhead, tensor fragmentation - Establish coding patterns, abstractions, and frameworks to modularize kernels for reuse, cross-backend portability, and maintainability - Influence system architecture decisions to make kernel improvements more effective (e.g. memory layout, dataflow scheduling, kernel fusion boundaries) - Mentor and guide other engineers working on lower-level performance, provide code reviews, help set best practices - Collaborate with infrastructure, tooling, and ML teams to roll out
Read the full description at databricks.com. FewerJobs shows a source-linked preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.
Related jobs
-
Director of Product, Core Remittance Experience
Remitly Global INC - Seattle, Washington United States
-
Product Security Principal
Flagstar BANK National Association - 2 Locations
-
Staff Software IPT Lead
Northrop Grumman - United States-California-Woodland Hills
-
Director of Financial Planning & Practice Development
UMB Financial CORP - Leawood KS