Liminal logo

Senior Data Engineer

Liminal
Full-time
On-site
Lisbon, Portugal
Senior Jobs

About the role

This role focuses on building and maintaining strong data architectures, pipelines, and systems that support the effective collection, storage, and processing of data across multiple departments. The Data Engineer will play a pivotal role in ensuring the scalability, reliability, and performance of our data systems. With a strong background in data engineering, cloud infrastructure, and data pipeline automation, the Data Engineer will work on projects from initial design to deployment, supporting the seamless integration of data workflows into product and operational teams.

What you'll do

Cross-Department Data Solutions:

  • Collaborate with various departments to understand data needs, assess technical feasibility, and design efficient data engineering solutions to support organizational initiatives.
  • Implement scalable data workflows that optimize data availability, quality, and accessibility for AI, business analytics, and other internal teams.
  • Support product teams in transitioning mature data pipelines and systems to ensure alignment with product goals and technical requirements.

Data Pipeline Development & Optimization:

  • Design, implement, and maintain data pipelines that ingest, process, and transform large-scale datasets for internal applications, including AI and machine learning models.
  • Build efficient ETL (Extract, Transform, Load) processes that streamline the movement of data between systems, databases, and analytics platforms.
  • Optimize data flows to ensure high performance, low latency, and scalability, adapting pipelines to handle both batch and real-time processing.

Cloud Infrastructure & System Integration:

  • Develop and maintain cloud-based data infrastructure on a major cloud platform (e.g., AWS, Azure, GCP, or similar), ensuring data systems are robust, cost-effective, and performant.
  • Implement data storage solutions and distributed databases that ensure seamless integration with other internal systems.
  • Leverage cloud services for scalable data processing and storage, ensuring that infrastructure can support growing datasets and organizational demands.

Data Quality & Governance:

  • Establish data validation processes to ensure data quality, consistency, and integrity across all pipelines and systems.
  • Collaborate with data scientists and analysts to ensure data is structured and formatted for optimal use in analytics and AI applications.
  • Ensure compliance with data governance policies and best practices for data privacy, security, and auditability.

Automation & Monitoring:

  • Implement automation for data processing workflows, reducing manual intervention and ensuring consistent delivery of high-quality data.
  • Set up monitoring and alerting systems for pipeline health, performance metrics, and data anomalies to proactively address any issues.
  • Continuously optimize existing data systems and pipelines to improve performance, reduce errors, and enhance reliability.

Documentation & Collaboration:

  • Maintain comprehensive documentation of data architectures, data pipeline designs, and system integrations to facilitate clear communication and collaboration.
  • Document technical workflows, processes, and system configurations to ensure smooth handoffs and enable other teams to leverage data assets effectively.
  • Collaborate with cross-functional teams, including data scientists, product developers, and business stakeholders, to ensure data solutions align with organizational goals.

Qualifications

  • 5+ years of experience in data engineering, data architecture, and system design, with extensive experience building and optimizing large-scale data systems.
  • Proficiency in Python, including object-oriented programming (OOP) and knowledge of software development best practices such as design patterns.
  • Strong understanding of SQL and experience with database management systems, such as PostgreSQL, MySQL, MongoDB, or other NoSQL solutions.
  • Experience with cloud-based platforms (e.g., GCP, AWS, Azure), particularly services for data storage, processing, and orchestration (e.g., BigQuery, Redshift, Synapse, S3, Cloud Storage, etc.).
  • Solid experience in data pipeline development, including stream and batch processing, ETL frameworks, and workflow orchestration tools like Airflow.
  • Experience with containerization technologies, including Docker, and orchestration tools like Kubernetes, ECS, AKS, or GKE.
  • Familiarity with CI/CD pipelines and version control systems (e.g., Git) and the ability to integrate cloud services into these workflows for automated deployments.
  • Proven ability to implement data security and privacy best practices, including encryption, access controls, and governance.
  • Strong problem-solving skills, with demonstrated ability to debug and optimize data pipelines and cloud-based architectures in production environments.
  • Excellent communication and collaboration skills, with the ability to work across cross-functional teams and engage with both technical and non-technical stakeholders.
  • Familiarity with monitoring tools (e.g., CloudWatch, Stackdriver, Azure Monitor) for tracking pipeline health, performance, and error reporting.

Nice to Have

  • Experience with Cloud Run / Cloud Run Functions.
  • Familiarity with Apache Spark, which could be valuable as data volumes increase.
  • Understanding of REST APIs and their role in data integration.
  • Exposure to data modeling for AI and machine learning pipelines.