FewerJobs.
All jobs

Member of Technical Staff, Exceptional Generalist (Remote)

Inferact - Remote, US

Posted Jan 22, 2026

Benefits

Parental leave
Not verified
Non-birth-parent leave
Not verified
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified
Salary
Not verified
401(k) match
Not verified

Was this benefit information wrong? Tell us.

Market context

U.S. role benchmark (BLS OEWS)
$116,543 U.S. median for this role
Projected growth (BLS Employment Projections)
+9.8% - Much faster than average

Matched to SOC 15-1252 - Software Engineering aggregate by role bucket.

Source: U.S. Bureau of Labor Statistics, OEWS, May 2024 and Employment Projections, 2024-2034.

Role

Role function
Engineering From the posting source checked Jun 20, 2026
Seniority
Staff Plus From the posting source checked Jun 20, 2026
Work mode
Remote From the posting source checked Jun 20, 2026
In-office days
0 days From the posting source checked Jun 20, 2026

Schedule

Shift type
Not verified
Weekend work
Not verified

Company

Equity
Offered From the posting source checked Jun 20, 2026

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

Member of Technical Staff, Exceptional Generalist (Remote) Remote, US Inferact's mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. Founded by the creators and core maintainers of vLLM, we sit at the intersection of models and hardware-a position that took years to build. About the Role This is a globally remote opportunity. We're seeking exceptional generalist engineers who can work across the entire vLLM stack: from low-level GPU kernels to high-level distributed systems. This role is designed for self-directed, autonomous individuals who can identify the highest-leverage problems and solve them end-to-end without constant guidance. You'll work asynchronously with our San Francisco headquarters while maintaining full ownership of critical infrastructure. You might be optimizing CUDA kernels one week, designing distributed orchestration systems the next, and implementing new model architectures the week after. The work you do will directly impact how the world runs AI inference. Potential focus areas include: - Inference Runtime: Push the boundaries of LLM and diffusion model serving. Work at the core of vLLM to optimize how models execute across diverse hardware and architectures. - Kernel Engineering: Write the low-level kernels and optimizations that make vLLM the fastest inference engine in the world, running on hundreds of accelerator types. - Performance & Scale: Build the distributed systems that power inference at global scale-design foundational layers enabling vLLM to serve models across thousands of accelerators with minimal latency. - Cloud Orchestration: Build the operational backbone for cluster

Read the full description at jobs.ashbyhq.com. FewerJobs shows a preview and links to the original posting.

Apply at jobs.ashbyhq.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has field-level provenance to a source FewerJobs pulled: a government or employer source, or the original job posting. Posting-sourced facts are employer-stated and are labeled separately from government records.

Related jobs