Location: Bengaluru (Hybrid – 3 days office / 2 days WFH)
Experience: 7–8 Years
Employment Type: Full-Time,29490
Shift: Day shift with partial overlap with US stakeholders
Notice Period: Immediate to 30 days preferred
We are looking for a Senior / Lead Data Engineer to drive the design and modernization of a large-scale enterprise analytics platform. This role combines hands-on engineering with data architecture leadership, focusing on building a modern Databricks + Spark Lakehouse ecosystem on AWS.
You will play a key role in shaping data platform strategy, building scalable pipelines, and ensuring strong governance, reliability, and performance across analytics systems.
Design and implement scalable data pipelines for batch and streaming workloads
Build and optimize ETL/ELT pipelines using Python, Spark, and SQL
Develop data solutions using Databricks Lakehouse architecture
Define standards for data modeling, storage formats, and performance optimization
Work extensively with AWS services such as S3, Lambda, and EMR
Build reliable and high-performance data processing frameworks
Enable near real-time processing using streaming technologies
Implement workflow orchestration using Apache Airflow
Build CI/CD pipelines and automate deployments
Use Docker, Kubernetes, and Infrastructure as Code (Terraform/CloudFormation)
Implement data lineage, cataloging, and access control frameworks
Maintain enterprise metric definitions and ensure consistency across reporting
Partner with analytics and business teams to deliver trusted, high-quality data
Implement monitoring, alerting, and observability for data pipelines
Define SLAs/SLOs and ensure platform reliability
Mentor and guide junior and mid-level engineers
min 6 years of experience in data engineering and distributed data systems
Strong hands-on experience with:
Databricks
Apache Spark
Python for large-scale data processing
Advanced SQL (complex queries, performance tuning, data modeling)
Solid experience with AWS (S3, Lambda, EMR) in production environments
Experience building and managing ETL/ELT pipelines at scale
Hands-on experience with Apache Airflow for orchestration
Familiarity with CI/CD pipelines and version control (Git, Jenkins or similar)
Experience with Docker and infrastructure automation (Terraform or CloudFormation)
Knowledge of data governance, lineage, and cataloging practices
Experience working in modern Lakehouse architectures
Databricks Certified Data Engineer – Professional
Experience with Kafka, Kinesis, or Spark Streaming
Exposure to Kubernetes for container orchestration
Experience in large-scale data platform migrations or modernization projects
Knowledge of enterprise KPI frameworks and semantic data layers
AWS certification (Solutions Architect – Associate or Professional)