We are building a greenfield MVP for a healthcare data platform focused on unifying and mastering data across the US healthcare payer ecosystem — including payers, PBMs, plan sponsors, health plans, formularies, and benefit designs. The platform will serve as a canonical source of truth for healthcare coverage and formulary data, enabling accurate, real-time insights and downstream analytics at the point of service.
Data from multiple external vendors is received through Snowflake Share and other managed interfaces and processed through an ETL and Master Data Management (MDM) pipeline to create a consolidated, hierarchical dataset representing the relationships between coverage entities. This normalized and linked output will be loaded into a PostgreSQL (or comparable) database and served via a low-latency REST API.
Key pipeline stages include:
- Standardization (cleansing, mapping, enrichment across payer and plan identifiers)
- Record linkage and entity resolution to unify related entities across disparate data sources
- Survivorship logic to derive and maintain golden records for core entities (payer, plan, formulary, etc.)
- Relationship modeling between payers, PBMs, sponsors, and plans
- Summarization into a canonical master data layer consumable by downstream services
The data is updated weekly (batch-based system). The system will initially support API endpoints that return coverage, formulary, and benefit design information given a set of identifiers or attributes.
You will be part of a lean, senior-level engineering team and expected to own key parts of the ETL, MDM, and data modeling effort.
Team composition: Tech Lead, Software Engineer, DevOps Engineer, Delivery Manager
Scope of tasks and ownership:
- Design and build scalable ETL and MDM pipelines using AWS Entity Resolution, AWS Glue and PySpark, transforming multi-source payer ecosystem data into unified, hierarchical datasets
- Implement record-matching and entity resolution workflows (deterministic, probabilistic, or ML-assisted) to establish and maintain golden records across payers, PBMs, plans, and related entities
- Define and apply survivorship rules and relationship logic for master data consolidation
- Apply data cleansing, mapping, and enrichment logic (e.g., crosswalking payer, formulary, and plan identifiers)
- Load processed and mastered outputs into downstream relational databases powering low-latency REST APIs
- Optimize Glue jobs and Spark transformations for scalability, cost-efficiency, and performance
- Collaborate with downstream API engineers to ensure schema alignment and master data availability
- Ensure data privacy, compliance, and governance (HIPAA-aware where applicable)
- Contribute to architectural decisions and the MDM implementation roadmap in a fast-moving MVP cycle
What You’ll need:
- 5+ years in data engineering, data platform, or MDM-focused development roles
- Strong proficiency in Python, PySpark, and AWS Glue for large-scale data processing and orchestration
- Advanced SQL skills and experience with relational and analytical databases
- Familiarity with both deterministic and probabilistic matching approaches
- Experience with hierarchical or graph-like data models and complex data relationships
- Experience working with large-scale, batch-oriented data pipelines (100M+ records)
- Solid understanding of ETL and data modeling design patterns
- Comfort working in lean teams and greenfield, fast-paced environments
- Awareness of healthcare data compliance standards (HIPAA, de-identification workflows)
Preferred Qualifications:
- Proven experience implementing Master Data Management or Entity Resolution systems (record matching, survivorship, golden record creation) preferred
- Hands-on experience designing or maintaining entity resolution matching rules and survivorship rule engines
- Familiarity with payer, PBM, or formulary data and related industry identifiers (BIN, PCN, Group ID, etc.)
- Experience with AWS Glue Studio, Glue Workflows, or Glue Data Catalog
- Exposure to Snowflake, Redshift, or BigQuery for downstream analytic workloads
- Experience with PostgreSQL and API integration for analytic or operational workloads
- Prior experience integrating MDM or entity resolution pipelines with downstream applications
Our benefits:
- No micromanagement
- Freedom to engage in decision-making and implementation
- Ability to work in a team of professionals (the ratio of middle and above specialists 80/20)
- Participation in the development of high-quality products
- Direct communication with clients on a partnership level
- Professional development opportunities ($600 education budget, well-managed processes, communities, internal library)
- Health insurance
- $600 extra for health care, sports, or mental health
- Accounting services
- 20 paid working days off and 10 days sick leave
- Opportunity to work remotely
- Soulful team buildings and corporate events
Join us and be among those who care!