c

Senior Data Engineer

cloudandthings.io
Full-time
On-site
Cape Town, South Africa

Introduction

We are seeking a highly skilled and experienced AWS Data Engineer to join our team.

This role involves designing, building, and optimising data architecture and data pipelines on AWS. The ideal candidate will have a strong background in data processing technologies and experience in cloud environments.

While this list is long, an ideal candidate should have working knowledge and experience covering many of the tools and services. The requirements for each project differ over time, and these skills provide an overview of what may typically be required of a Data Engineer.

We aim to build one of the strongest Engineering capabilities within Africa. Our Data Engineers are key to help us achieve this.

AWS Data Engineers are responsible for building and operating data pipelines, primarily using the AWS Cloud.

 

Duties & Responsibilities

Key Responsibilities

The following outlines the primary responsibilities for an AWS Data Engineer, but the role will require flexibility and adaptation as the business expands.

Software Engineering - Fundamentals

A fundamental software engineering skill set underpins all engineering work at cloudandthings.io.

  • Experience with modern operating systems, particularly Linux.
  • Experience working with terminals / cli.
  • Experience with version control software, particularly Git.
  • Software fundamentals, such as Problem-solving, data structures and algorithms, software development methodologies, common design patterns and best practices.
  • Experience with the following programming languages, preferably more. Python and SQL.

Cloud - Core

  • Ability to identify serverless options, managed options, and roll-your-own options, strengths and weaknesses.
  • Development experience working with Terraform or Cloud Formation IAC to provision and maintain data infrastructure.
  • Familiarity with AWS well architected principles and experience implementing them.

Data Engineering - General

  • Working knowledge of Big Data: Volume, Variety, Velocity, etc.

Data Engineering - Collection

  • Good experience collecting data in hybrid environments: on-premise to cloud, and cloud to on-premise.
  • Real-time: AWS Kinesis Data Streams (KDS), Kafka / MSK.
  • Near Real-time: AWS Kinesis Data Firehose (KDF).
  • Batch: AWS DataSync, StorageGateway, TransferFamily (FTP / SFTP / MFT), Snowball.
  • Databases: ODBC / JDBC, database replicas and replication tools, migration tools such as Database Migration Service (DMS) and SCT.

Data Engineering - Storage

  • Basic experience working with on-premise storage solutions: NFS / SMB, NAS / DAS, etc.
  • Cloud Storage: Amazon S3.
  • Data Formats: Parquet, CSV, Avro, JSON etc., compression, partitioning.
  • NoSQL Databases: AWS DynamoDB, MongoDB, etc.
  • Relational Databases: AWS RDS or similar, MySQL / PostgreSQL, Aurora.
  • Massively Parallel Processing: Redshift.
  • Search Databases: AWS Elasticsearch / OpenSearch.
  • Caching: Redis / Memcached.

Data Engineering - Processing

  • Strong experience developing ETL processes, and integrating with source and destination systems.
  • Strong experience developing using Python, Spark (e.g. PySpark), and SQL to work with data.
  • Basic experience with Lakehouse technologies such as Apache Hudi, Apache Iceberg or Databricks DeltaLake.
  • AWS Lambda for file / stream / event processing, ETL, and triggers.
  • General ETL and cataloguing of data, access control: AWS Glue ETL, Glue catalog, LakeFormation.
  • Hadoop-like processing: Mainly Spark and Hive, Instance types and cluster and job sizing; AWS Elastic MapReduce (EMR).

Data Engineering - Analysis

  • Basic understanding of cloud data warehouse architecture and data integration: AWS Redshift and Redshift spectrum.
  • Data modelling skills, normalisation, facts and dimensions.
  • Experience in data quality.
  • On-object-store querying: AWS Athena, Glue Crawlers.

Data Engineering - Security

  • Basic experience with authentication and identity federation, authorisation and RBAC pertaining to data.
  • Basic knowledge of cloud network security: AWS VPC, VPC endpoints, Subnets, DirectConnect.
  • Identity and Access Management: AWS IAM, STS and cross-account access.
  • Encryption for data at rest, and data in motion for all services used: AWS KMS / SSE, TLS, etc.

Data Engineering - Operations

  • Orchestration of data pipelines: AWS Step Functions, Managed Apache Airflow, Glue, etc.
  • Basic knowledge of good architecture pillars and how to apply them:
    • Operational excellence, Security, and Reliability.
    • Performance Efficiency, Cost Optimisation, and Sustainability.

Advantageous technical skills

  • Any other data-related experience, e.g. working with Hadoop, databases, analytics software, etc.
  • Experience working with a second cloud vendor, e.g. both AWS and Azure.
  • Experience working with Docker / Containers / CICD pipelines for data.
  • Experience working with and contributing to open-source projects.

Desired Experience & Qualification

Requirements

  • Bachelor’s or Master's degree in Computer Science, Engineering, Information Technology, or related field.
  • 5+ years of experience in a data engineering role with a focus on cloud technologies, specifically Amazon Web Services.
  • Knowledge of data related Amazon Web Services and understanding of best practices in Data Architecture and Data Modelling.
  • Strong proficiency in SQL and experience with scripting languages such as Python.
  • Experience with big data technologies (e.g., Hadoop, Spark) and familiarity with machine learning algorithms.
  • Experience in real-time data processing and analytics.
  • Knowledge of data integration and enterprise integration patterns.
  • Familiarity with DevOps practices and tools related to data management (CI / CD for data pipelines, version control, etc.).
  • Excellent problem-solving skills and the ability to work collaboratively in a team environment.
  • Strong communication and interpersonal skills to interact with team members and stakeholders.
  • Proven track record of designing and implementing data solutions.
  • Knowledge of and experience with Cloud infrastructure and services.
  • Understanding of Cloud Security best practices.
  • Willingness to learn and expand knowledge related to Cloud and Data Technologies.
  • Strong problem-solving and analytical skills.
  • Self-organising with the ability to prioritise and manage multiple tasks simultaneously.
  • Willingness to travel to clients as and when required.
  • Any of the following certifications:
    • AWS Certified Data Engineer - Associate
    • AWS Certified Machine Learning – Specialty
    • AWS Solutions Architect - Associate
    • AWS Solutions Architect - Professional
    • Databricks Certified Data Analyst Associate
    • Databricks Certified: Data Engineer Associate
    • Databricks Certified: Data Engineer Professional
    • Databricks Certified Machine Learning Associate
    • Databricks Certified Machine Learning Professional
    • Microsoft Certified: Azure Data Engineer Associate (DP-203)
    • Microsoft Certified: Azure AI Engineer Associate (AI-102)
    • Microsoft Certified: Fabric Analytics Engineer Associate (DP-600)
    • Microsoft Certified: Fabric Data Engineer Associate (DP-700)
    • Microsoft Certified: Azure Data Scientist Associate (DP-100)
    • Microsoft Certified: Azure Solutions Architect Expert (AZ-305)

 

Package & Remuneration

What We Offer

  • A culture of engineering and an environment where ideas are heard and builders can build.
  • Competitive salary, bonus, and incentive structure.
  • A flexible and supportive work environment that values diversity, work-life balance, and personal growth.
  • Opportunities for career advancement and ongoing professional development.
  • The chance to work on cutting-edge products and technologies that make a real impact on people's lives.
  • Mentoring and knowledge-sharing opportunities that foster personal and professional growth, with access to experienced leaders and a supportive peer network.