Currently at DocNexus · Seattle, WA

Data Engineer
in healthcare.

5+ years building data infrastructure across US healthcare and finance. I work at the intersection of clinical intelligence, pharma, and engineering — making sense of messy, high-stakes data.

📍 Seattle, WA
⏱ 5+ years
🔬 BigQuery · FastAPI · AWS
Skills

Tech Stack

Data Warehousing
BigQueryClickHouse CloudSQLRedshift
Backend & APIs
FastAPIPython RESTSQLAlchemy
Cloud & Infra
AWS EC2S3 GCPDocker
Pipelines
AirflowdbtBatch Python
Governance
OpenMetadataNeo4j HIPAAGDPR
Healthcare Data
NPI RegistryClinicalTrials.gov PubMedORCID
Work History

Experience

Current
DocNexus
Mar 2025 → Present
Data Engineer · Healthcare Commercial Intelligence · Seattle, WA
  • Built a multi-tier HCP-to-trial matching engine on BigQuery — scoring across NPI, institution, location, and ML embedding similarity against ClinicalTrials.gov.
  • Designed a tiered ClinicalTrials.gov v2 API integration replacing imprecise keyword search, dramatically improving trial coverage across physician cohorts.
  • Architected HCP publication matching pipeline with multi-dimensional scoring (affiliation, MeSH, co-author overlap, journal affinity) across PubMed datasets.
  • Built FastAPI services for real-time clinical data lookup; managing EC2 batch pipelines for ophthalmology and oncology HCP datasets.
  • Delivering data products for pharma accounts including GSK/ViiV, Novartis, Takeda, and Chugai.
BigQueryClickHouseFastAPI AWS EC2ClinicalTrials v2 APIPubMedNPI
Cogneo Technologies
2022 → 2025
Data Engineer · US Finance & Compliance · Remote
  • Led data governance and quality frameworks for US financial domain clients — metadata catalog, lineage graph, and automated quality checks.
  • Built compliance-focused pipelines with GDPR/CCPA controls using Airflow and graph-based data lineage with Neo4j.
  • Designed and maintained data dictionaries and audit trails for regulatory reporting across critical financial datasets.
OpenMetadataAirflowNeo4j Data GovernanceGDPR / CCPAPython
Work Samples

Selected Projects

🧬
Clinical Trials Matching Engine
Multi-tier HCP-to-trial scoring on BigQuery. Matches physicians using name, institution, location, and ML.DISTANCE embedding similarity. Runs in batch on EC2 across NPI cohorts.
BigQueryML.DISTANCEEC2
📄
HCP Publication Matching
Multi-region BigQuery scoring (max 110 pts) linking physicians to PubMed publications via affiliation, MeSH, ORCID mapping, and co-author graph.
PubMedBigQueryORCID API
Clinical Data API Service
FastAPI service for real-time provider lookup and tiered trial search using Essie query syntax. Specialty-specific keyword mappings. Async BigQuery clients per GCP region.
FastAPIAsync PythonBigQuery
🏛️
Data Governance Platform
End-to-end governance stack for US financial clients — metadata catalog, lineage via Neo4j, automated quality checks, GDPR/CCPA compliance reporting.
OpenMetadataNeo4jAirflow
Motivation

What drives me

US Healthcare Complexity
Fragmented systems, high stakes, real patient impact. Exactly the kind of problem worth dedicating years to.
Clinical Intelligence
Connecting HCPs, trials, publications, and pharma signals into something actionable for the people who need it.
Infrastructure that lasts
Fast, reliable, and maintainable. I care about the long-term health of systems, not just shipping.
Governance & Trust
Data you can trace and defend — especially when the stakes are regulatory and quality matters.