Member of Technical Staff, Exceptional Generalist (Remote)
Inferact - Remote, US
Posted Jan 22, 2026
Benefits
- Parental leave
- Not verified
- Non-birth-parent leave
- Not verified
- Family-building benefits
-
- Fertility benefits: Not verified
- Adoption assistance: Not verified
- Surrogacy assistance: Not verified
- Mental health support
- Not verified
- Relocation assistance
- Not verified
- Childcare support
- Not verified
- Learning budget
- Not verified
- Verification
- Not verified
- Salary
- Not verified
- 401(k) match
- Not verified
Was this benefit information wrong? Tell us.
Market context
- U.S. role benchmark (BLS OEWS)
- $116,543 U.S. median for this role
- Projected growth (BLS Employment Projections)
- +9.8% - Much faster than average
Matched to SOC 15-1252 - Software Engineering aggregate by role bucket.
Source: U.S. Bureau of Labor Statistics, OEWS, May 2024 and Employment Projections, 2024-2034.
Role
Schedule
- Shift type
- Not verified
- Weekend work
- Not verified
Company
- Equity
- Offered From the posting source checked Jun 20, 2026
Application
- Cover letter
- Not verified
- Assessment
- Not verified
- Deadline
- Not stated
Where they hire
State eligibility is not yet verified.
About this role
Member of Technical Staff, Exceptional Generalist (Remote) Remote, US Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware-a position that took years to build. About the Role This is a globally remote opportunity. We're seeking exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU kernels to high-level distributed systems. This role is designed for self-directed, autonomous individuals who can identify the highest-leverage problems and solve them end-to-end without constant guidance. You'll work asynchronously with our San Francisco headquarters while maintaining full ownership of critical infrastructure. You might be optimizing CUDA kernels one week, designing distributed orchestration systems the next, and implementing new model architectures the week after. The work you do will directly impact how the world runs AI inference. Potential focus areas include: - Inference Runtime: Push the boundaries of LLM and diffusion model serving. Work at the core of vLLM to optimize how models execute across diverse hardware and architectures. - Kernel Engineering: Write the low-level kernels and optimizations that make vLLM the fastest inference engine in the world, running on hundreds of accelerator types. - Performance & Scale: Build the distributed systems that power inference at global scale-design foundational layers enabling vLLM to serve models across thousands of accelerators with minimal latency. - Cloud Orchestration: Build the operational backbone for cluster
Read the full description at jobs.ashbyhq.com. FewerJobs shows a preview and links to the original posting.
Apply link not verified; last-live date unavailable.
What verified means
Verified means a displayed claim has field-level provenance to a source FewerJobs pulled: a government or employer source, or the original job posting. Posting-sourced facts are employer-stated and are labeled separately from government records.
Related jobs
-
Staff System Architect
Northrop Grumman - United States-Illinois-Rolling Meadows
-
Sr. Staff System Architect
Northrop Grumman - United States-Illinois-Rolling Meadows
-
Staff Engineer NDT
Northrop Grumman - United States-California-Sunnyvale
-
Payload AI&T Lead Sr. Principal Systems Engineer
Northrop Grumman - United States-Maryland-Linthicum