Job Description:
Responsibilities:
- Develop & Optimize Data Pipelines
- Build, test, and maintain ETL/ELT data pipelines using Azure Databricks & Apache Spark (PySpark).
- Optimize performance and cost-efficiency of Spark jobs.
- Ensure data quality through validation, monitoring, and alerting mechanisms.
- Understand cluster types, configuration, and use-case for serverless
- Implement Unity Catalog for Data Governance
- Design and enforce access control policies using Unity Catalog.
- Manage data lineage, auditing, and metadata governance.
- Enable secure data sharing across teams and external stakeholders.
- Integrate with Cloud Data Platforms
- Work with Azure Data Lake Storage / Azure Blob Storage/ Azure Event Hub to integrate Databricks with cloud-based data lakes, data warehouses, and event streams.
- Implement Delta Lake for scalable, ACID-compliant storage.
- Automate & Orchestrate Workflows
- Develop CI/CD pipelines for data workflows using Azure Databricks Workflows or Azure Data Factory.
- Monitor and troubleshoot failures in job execution and cluster performance.
- Collaborate with Stakeholders
- Work with Data Analysts, Scientists, and Business Teams to understand requirements.
- Translate business needs into scalable data engineering solutions.
- API expertise
- Ability to pull data from a wide variety of APIs using different strategies and methods
Required Skills & Experience:
- Azure Databricks & Apache Spark (PySpark) – Strong experience in building distributed data pipelines.
- Python – Proficiency in writing optimized and maintainable Python code for data engineering.
- Unity Catalog – Hands-on experience implementing data governance, access controls, and lineage tracking.
- SQL – Strong knowledge of SQL for data transformations and optimizations.
- Delta Lake – Understanding of time travel, schema evolution, and performance tuning.
- Workflow Orchestration – Experience with Azure Databricks Jobs or Azure Data Factory.
- CI/CD & Infrastructure as Code (IaC) – Familiarity with Databricks CLI, Databricks DABs, and DevOps principles.
- Security & Compliance – Knowledge of IAM, role-based access control (RBAC), and encryption.
Preferred Qualifications:
- Experience with MLflow for model tracking & deployment in Databricks.
- Familiarity with streaming technologies (Kafka, Delta Live Tables, Azure Event Hub, Azure Event Grid).
- Hands-on experience with dbt (Data Build Tool) for modular ETL development.
- Certification in Databricks, Azure is a plus.
- Experience with Azure Databricks Lakehouse connectors for SalesForce and SQL Server
- Experience with Azure Synapse Link for Dynamics, dataverse
- Familiarity with other data pipeline strategies, like Azure Functions, Fabric, ADF, etc
Soft Skills:
- Strong problem-solving and debugging skills.
- Ability to work independently and in teams.
- Excellent communication and documentation skills.