FewerJobs.
All jobs

Annotation Data Scientist, Evaluation Integrity (Siri)

Apple - Cambridge, United States of America

Posted May 19, 2026

Benefits

Parental leave
Not verified
Non-birth-parent leave
Not verified
Family-building benefits
  • Fertility benefits: Not verified
  • Adoption assistance: Not verified
  • Surrogacy assistance: Not verified
Mental health support
Not verified
Relocation assistance
Not verified
Childcare support
Not verified
Learning budget
Not verified
Verification
Not verified last checked Jun 13, 2026
Salary
Not verified not verified - source not recorded; timestamp not recorded
401(k) match
Listed Source: EMPLR_CONTRIB_INCOME_AMT. source Last checked Jun 13, 2026.

Was this benefit information wrong? Tell us.

Schedule

Shift type
Not verified
Weekend work
Not verified

Application

Cover letter
Not verified
Assessment
Not verified
Deadline
Not stated

Where they hire

State eligibility is not yet verified.

About this role

Annotation Data Scientist, Evaluation Integrity (Siri) Cambridge, United States of America Play a part in the ongoing revolution in human-computer interaction. Siri is evolving - and the way we evaluate it has to evolve with it. Join the Evaluation Integrity team to help build the trusted quality signal behind every Siri release. Within the Siri evaluation organization, the Human Evaluation sub-team is responsible for answering the question: can we trust our evals? We do that by designing human-in-the-loop (HITL) annotation tasks that scrutinize every moving part of an agentic evaluation - the simulated user agent, the conversation it has with Siri, and the automated evaluators that grade the exchange. This role sits at the intersection of data science, human annotation engineering, and evaluation methodology, and is instrumental in turning human judgment into a rigorous, reproducible signal that directly informs pre-ship model and product decisions. As an Annotation Data Scientist on the Evaluation Integrity team, you will design and run HITL annotation projects that evaluate the quality and authenticity of agentic user personae, the validity of agent-to-agent conversations, and the reliability of LLM-as-judge and rule-based evaluators against Siri's product specifications. You will own annotation initiatives end-to-end; from rubric design and tooling, through annotator calibration, to data science analysis that turns annotator judgments into actionable signal for modeling, planning, and product teams. Design HITL annotation tasks for agentic evaluation. Advise on rubrics and design workflows that ask annotators to assess (a) the quality and authenticity of user agent personae, (b) the validity

Read the full description at jobs.apple.com. FewerJobs shows a source-linked preview and links to the original posting.

Apply at jobs.apple.com

Apply link not verified; last-live date unavailable.

What verified means

Verified means a displayed claim has a recorded source field, a source URL when available, and a timestamp showing when FewerJobs checked or enriched the evidence.

Related jobs