This is a remote position.
Job Summary
As a Senior Data Engineer, you will be responsible for designing, building, and optimizing data pipelines and lakehouse architectures on AWS. You will ensure data availability, quality, lineage, and governance across analytical and operational platforms. Your expertise will enable scalable, secure, and cost-effective data solutions that power advanced analytics and business intelligence.
Responsibilities
- Implement and manage S3 (raw, staging, curated zones), Glue Catalog, Lake Formation, and Iceberg/Hudi/Delta Lake for schema evolution and versioning.
- Develop PySpark jobs on Glue/EMR, enforce schema validation, partitioning, and scalable transformations.
- Build workflows using Step Functions, EventBridge, or Airflow (MWAA), with CI/CD deployments via CodePipeline & CodeBuild.
- Apply schema contracts, validations (Glue Schema Registry, Deequ, Great Expectations), and maintain lineage/metadata using Glue Catalog or third-party tools (Atlan, OpenMetadata, Collibra).
- Enable Athena and Redshift Spectrum queries, manage operational stores (DynamoDB/Aurora), and integrate with OpenSearch for observability.
- Design efficient partitioning/bucketing strategies, adopt columnar formats (Parquet/ORC), and implement spot instance usage/bookmarking.
- Enforce IAM-based access policies, apply KMS encryption, private endpoints, and GDPR/PII data masking.
- Prepare Gold-layer KPIs for dashboards, forecasting, and customer insights with QuickSight, Superset, or Metabase.
- Partner with analysts, data scientists, and DevOps to enable seamless data consumption and delivery.
Requirements
Essential Skills
Job
- Hands-on expertise with AWS data stack (S3, Glue, Lake Formation, Athena, Redshift, EMR, Lambda).
- Strong programming skills in PySpark & Python for ETL, scripting, and automation.
- Proficiency in SQL (CTEs, window functions, complex aggregations).
- Experience in data governance, quality frameworks (Deequ, Great Expectations).
- Knowledge of data modeling, partitioning strategies, and schema enforcement.
- Familiarity with BI integration (QuickSight, Superset, Metabase).
Personal
- Strong problem-solving ability in complex data environments.
- Ability to communicate technical insights to non-technical stakeholders.
- Commitment to best practices in data governance, compliance, and security.
- Collaborative mindset with cross-functional teams.
Preferred Skills
Job
- Real-time ingestion experience (Kinesis, MSK, Kafka on AWS).
- Exposure to ML feature store integration with SageMaker.
- Infrastructure as Code (Terraform, CloudFormation, or CDK).
- Experience with Data Mesh or domain-driven data architecture.
Personal
- Experience mentoring junior data engineers.
- Ability to lead data projects from design to production.
- Proactive in learning new AWS and data ecosystem technologies.
Other Relevant Information
- Bachelor’s/Master’s degree in Computer Science, Information Technology, or related field.
- Minimum 4 years of proven experience in data engineering with AWS.
Benefits
- This role offers the flexibility of working remotely in India.
LeewayHertz is an equal opportunity employer and does not discriminate based on race, color, religion, sex, age, disability, national origin, sexual orientation, gender identity, or any other protected status. We encourage a diverse range of applicants.