- Participate in the customer’s system design meetings and collect the functional/technical requirements.
- Build-up data pipelines for consumption by the data science team.
- Skillful in ETL process and tools.
- Clear understanding and experience with Python and PySpark or Spark and SCALA, with HIVE, Airflow, Impala, and Hadoop and RDBMS architecture.
- Experience in writing Python programs and SQL queries.
- Experience in SQL Query tuning.
- Experienced in Shell Scripting(Unix/Linux).
- Build and maintain data pipelines in Spark/Pyspark with SQL and Python or SCALA.
- Knowledge of Cloud (Azure/AWS/GCP, etc..) technologies is additional.
- Good to have knowledge of Kubernetes, CI/CD concepts, Apache Kafka
- Suggest and implement best practices in data integration.
- Guide the QAteam indefining system integration tests as needed.
- Split the planned deliverables into tasks and assign them to the team.
- Needs to Maintain/Deploy the ETL code and follow the Agile methodology
- Needs to work on optimization wherever applicable.
- Good oral,written and presentation skills.
Requirements
- Degree in Computer Science, IT, or similar field; a Master’s is a plus.
- Hands-on experience with Python and Pyspark
- Hands-on experience with Spark and SCALA.
- Great numerical and analytical skills.
- Working knowledge of cloud platforms such as MS Azure, AWS, etc...
- Technical expertise with data models, data mining, and segmentation techniques.