Member of Technical Staff, Exceptional Generalist (Remote)

Inferact - Remote, US

Posted Jan 22, 2026

Benefits

Parental leave: Not verified
Non-birth-parent leave: Not verified
Family-building benefits: Fertility benefits: Not verified
Adoption assistance: Not verified
Surrogacy assistance: Not verified
Mental health support: Not verified
Relocation assistance: Not verified
Childcare support: Not verified
Learning budget: Not verified
Verification: Not verified
Salary: Not verified
401(k) match: Not verified

Was this benefit information wrong? Tell us.

Market context

U.S. role benchmark (BLS OEWS): $116,543 U.S. median for this role
Projected growth (BLS Employment Projections): +9.8% - Much faster than average

Matched to SOC 15-1252 - Software Engineering aggregate by role bucket.

Source: U.S. Bureau of Labor Statistics, OEWS, May 2024 and Employment Projections, 2024-2034.

Role

Role function: Engineering From the posting source checked Jun 20, 2026
Seniority: Staff Plus From the posting source checked Jun 20, 2026
Work mode: Remote From the posting source checked Jun 20, 2026
In-office days: 0 days From the posting source checked Jun 20, 2026

Schedule

Shift type: Not verified
Weekend work: Not verified

Company

Equity: Offered From the posting source checked Jun 20, 2026

Application

Cover letter: Not verified
Assessment: Not verified
Deadline: Not stated

Where they hire

State eligibility is not yet verified.

About this role

Member of Technical Staff, Exceptional Generalist (Remote) Remote, US Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware-a position that took years to build. About the Role This is a globally remote opportunity. We're seeking exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU kernels to high-level distributed systems. This role is designed for self-directed, autonomous individuals who can identify the highest-leverage problems and solve them end-to-end without constant guidance. You'll work asynchronously with our San Francisco headquarters while maintaining full ownership of critical infrastructure. You might be optimizing CUDA kernels one week, designing distributed orchestration systems the next, and implementing new model architectures the week after. The work you do will directly impact how the world runs AI inference. Potential focus areas include: - Inference Runtime: Push the boundaries of LLM and diffusion model serving. Work at the core of vLLM to optimize how models execute across diverse hardware and architectures. - Kernel Engineering: Write the low-level kernels and optimizations that make vLLM the fastest inference engine in the world, running on hundreds of accelerator types. - Performance & Scale: Build the distributed systems that power inference at global scale-design foundational layers enabling vLLM to serve models across thousands of accelerators with minimal latency. - Cloud Orchestration: Build the operational backbone for cluster

Read the full description at jobs.ashbyhq.com. FewerJobs shows a preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has field-level provenance to a source FewerJobs pulled: a government or employer source, or the original job posting. Posting-sourced facts are employer-stated and are labeled separately from government records.

Related jobs

Staff System Architect

Northrop Grumman - United States-Illinois-Rolling Meadows
Sr. Staff System Architect

Northrop Grumman - United States-Illinois-Rolling Meadows
Staff Engineer NDT

Northrop Grumman - United States-California-Sunnyvale
Payload AI&T Lead Sr. Principal Systems Engineer

Northrop Grumman - United States-Maryland-Linthicum