Experience Level: 5 - 8 Years Location: Chennai
About the Role: We are seeking a highly skilled
and experienced Data Engineer to join our team, focusing on building and
maintaining robust data pipelines and infrastructure for our U.S. healthcare
payer operations. You will be responsible for the ingestion, transformation,
and availability of large, complex healthcare datasets, enabling our data
scientists and ML engineers to develop impactful solutions. This role requires
a strong foundation in data warehousing, ETL/ELT processes, and cloud-based
data platforms within a highly regulated environment.
Responsibilities:
- Design, develop, and optimize scalable
data pipelines (ETL/ELT) to ingest, transform, and load diverse healthcare
data from various sources into our data warehouse/lake.
- Build and maintain robust data models
(e.g., star schema, snowflake schema) to support analytics, reporting, and
machine learning initiatives.
- Write highly optimized SQL queries and
Python scripts for data manipulation, cleansing, validation, and
transformation.
- Ensure data quality, integrity, and
reliability across all data pipelines.
- Collaborate with data scientists, ML
engineers, and business stakeholders to understand data requirements and
translate them into efficient data solutions.
- Implement and manage data governance
policies, security measures, and access controls for sensitive healthcare
data (e.g., HIPAA compliance).
- Monitor data pipeline performance,
troubleshoot issues, and implement solutions for continuous improvement.
- Automate data workflows using
orchestration tools and integrate with CI/CD pipelines.
- Contribute to the selection, evaluation,
and implementation of new data technologies and tools.
- Document data pipelines, data models, and
data flow processes thoroughly.
Required Technical Skills:
- Programming Language: Strong proficiency in
Python for data processing and automation.
- Databases: Expert-level SQL
skills for complex querying, data manipulation, and optimization.
Experience with relational and NoSQL databases.
- Data Pipelines: Proven experience
designing and implementing ETL/ELT processes.
- Cloud Platforms: Extensive hands-on
experience with data services on AWS or Azure (e.g., AWS S3, Redshift,
Glue, EMR, Athena, Azure Data Lake, Azure Synapse Analytics, Azure Data
Factory).
- Containerization: Familiarity with
Docker for deploying data-related applications.
- Version Control: Proficient with Git
and GitHub.
- Testing: Experience with Python
unittest framework and Pytest for data pipeline testing.
- CI/CD: Understanding of CI/CD concepts and
experience integrating data pipeline deployments with AWS CodePipeline or
alternatives.
- Data Modeling: Strong knowledge of
data warehousing concepts and data modeling techniques.