CoAdvantage is adding a dedicated Data Scientist to conceive, prototype, and validate new ML models from existing internal data assets. This role sits at the front end of the model lifecycle: identifying high-value prediction problems, proposing statistical approaches, running experiments to validate feasibility, and producing handoff-ready artifacts that the Staff MLOps Engineer can carry to production.

The role is not operations-only. The expectation is that this Data Scientist will surface net-new model ideas — not only execute a predetermined roadmap — and will have meaningful latitude to define the problem framing, feature strategy, and experimental design for each candidate model. This role reports to the AI Experimentation Lead and coordinates closely with the Staff MLOps Engineer, the Principal AI Architect, and the data team.

Core responsibilities:

Model ideation and problem framing:
Identify prediction and estimation problems that are solvable from CoAdvantage's existing data assets. This includes reviewing operational data for signal, scoping the problem with business stakeholders, writing a pre-registration that specifies the target variable, success criterion, and identification strategy, and getting sign-off from the AI Experimentation Lead before investing in prototyping. The expectation is at least two novel model proposals per quarter, grounded in a data feasibility check before they enter the backlog.
Statistical design and experimental validation:
Design and run the statistical work that validates whether a candidate model is viable: exploratory analysis, feature relevance tests, baseline benchmarks, and where relevant, controlled or quasi-experimental designs. The Data Scientist is the methodological author of record for each candidate — identification strategy, assumptions, and known failure modes are documented in writing before a model proceeds to MLOps handoff. Relevant methods include supervised learning baselines, time-series decomposition, causal inference (difference-in-differences, propensity matching, synthetic control, interrupted time-series), and power analysis for experiment sizing. The role is expected to span methods rather than optimize a single toolkit.
Prototype development and handoff packaging:
Build prototype model code to the standard required for MLOps handoff: versioned repository, documented training pipeline, reproducible validation results, model card, and a handoff brief that specifies the serving contract, retrain frequency, monitoring schema, and known data dependencies. The MLOps Engineer should be able to carry the handoff to production without a significant rediscovery phase. Prototype code is expected to be production-oriented even at the prototype stage — not notebook-only. The Data Scientist is responsible for the code quality up to handoff; the MLOps Engineer owns it from handoff forward.
Collaboration with the data team on feasibility:
Before committing a candidate model to the backlog, validate data feasibility with the data team: source system availability, refresh cadence, data quality, tenant isolation, and governance constraints. Push back on infeasible proposals early rather than after significant prototype investment.
Documentation and methodological transparency:
Every model that reaches the handoff stage carries a complete model card: problem statement, training data window, feature definitions, identification strategy, baseline benchmarks, known failure modes, and the reconciliation plan. The bar is reproducibility from underlying data.

How we work:

AI-first coding - Claude Code, Copilot, or successor tools are the default development surface. Exploratory analysis, feature engineering scripts, training pipelines, and evaluation code are expected to be authored with agentic coding tools in the loop. Hand-coding without AI assistance is the exception, not the norm.
Pre-registered targets - No prototype proceeds to development without a written success criterion and a data feasibility sign-off. No model reaches MLOps handoff without a signed model card and handoff brief.
Methodological transparency - Identification strategies and validation choices are documented in writing and defended in review. "It performed well in cross-validation" is not sufficient to clear the handoff gate.
You propose - This role is expected to surface model ideas proactively, not wait for a roadmap. Proposals should arrive with a data feasibility check, a rough timeline, and the smallest experiment that could validate or invalidate the core assumption.
You estimate - Every workstream returns with a timeline, a confidence interval, and the smallest version that could be validated in two weeks.

Required qualifications:

Three or more years of experience as a Data Scientist building and validating ML models, including at least one model that reached a production or near-production state.
Strong Python and SQL. Comfortable authoring exploratory analysis, feature engineering, and model training code without an engineering intermediary.
Working fluency with core statistics: probability distributions, hypothesis testing, experimental design, power analysis, and model calibration. Causal inference fluency (difference-in-differences, propensity matching, synthetic control) is required — not optional.
Demonstrated breadth across ML problem types: at least two of supervised classification, regression, time-series forecasting, anomaly detection, or propensity modeling.
Hands-on experience with AI-assisted coding tools (Claude Code, Copilot, Cursor, or equivalent) as a daily driver, with code commits or repositories to demonstrate the practice.
Ability to produce clear written documentation: model cards, pre-registrations, and handoff briefs that a technical peer can act on without follow-up.
Comfort proposing novel problem framings from raw data rather than only executing predefined specifications.

Preferred qualifications:

Prior experience in a PEO, HR outsourcing, insurance brokerage, BPO, or other labor-intensive services organization.
Exposure to pricing, underwriting, churn, contact volume, or workforce planning use cases.
Familiarity with cloud ML platforms (Azure ML, Databricks, Vertex AI) and feature store concepts.
Experience working under data governance constraints typical of regulated multi-tenant environments (HIPAA, PII, tenant isolation).
Demonstrated track record of self-directed model ideation — examples of models you proposed, not only models you were assigned.

What success looks like at 12 months:

At least four candidate model proposals submitted, each with a pre-registration and data feasibility sign-off before prototype work begins.
At least two prototypes completed and handed off to the Staff MLOps Engineer with full model card and handoff brief.
At least one MLOps-handoff model in production monitoring at the 12-month mark.
At least one causal analysis or quasi-experimental readout co-authored with the AI Experimentation Lead that informed a business decision.
Established working pattern with the data team — feasibility reviews routine rather than ad hoc.

EEO

CoAdvantage is committed to providing equal employment opportunities to all employees and applicants without regard to race, color, religion, national origin, ancestry, citizenship status, age, sex (including pregnancy, childbirth, breast feeding and pregnancy-related medical conditions), gender, gender identity or expression, sexual orientation, marital status, uniform service member and veteran status, disability, genetic information, or any other characteristic protected by applicable federal, state, or local laws and ordinances.

Data Scientist – Pilot Machine Learning

How to apply

Similar jobs