Opkey logo

Senior Data Engineer

Opkey
Full-time
On-site
Noida, Uttar Pradesh, India
Senior Jobs

Senior Data Engineer 
Opkey | Series B Funded | Noida, India (In-Office) | Full-Time 

The Opportunity 
Opkey, a Series B funded enterprise application lifecycle management platform, is looking 
for a Senior Data Engineer to join our team in Noida. We need someone who can build and 
scale the data infrastructure—pipelines, storage systems, and processing engines—that 
powers our platform. 
We're not pitching a vision—we're scaling a reality. Our systems already process hundreds 
of gigabytes of enterprise data. Now we need an engineer who can make that infrastructure 
handle 10x more, 10x faster, with bulletproof reliability. This is your chance to be part of 
building something that will define a category. 

About Us 
Opkey is redefining how enterprises manage the lifecycle of their most critical applications. 
We've built the platform that takes organizations from Design to Configure to Test to Train, 
powered by agentic AI. 
Our customers already include Fortune 500 companies and top global system 
integrators. They trust us with hundreds of gigabytes of their most sensitive enterprise 
data—payroll files, configuration exports, test results—because we've proven we can handle 
it. 
We're already doing what others are only talking about. Our pipelines already process 
massive payroll files in real-time. Our systems already normalize chaotic enterprise data 
formats into clean, queryable structures. Our infrastructure already powers AI and analytics 
that enterprises depend on. 
Now we're scaling. And we need exceptional people to help us go from category creator to 
category leader. 
This is founder mode, not corporate mode. We move fast, we solve hard problems, and 
we ship things that matter. 

Why This Role Matters 
Data scientists can't build models on broken pipelines. Analysts can't find insights in dirty 
data. The entire intelligence layer of our platform depends on rock-solid data infrastructure. 
You'll build the foundation everything else depends on. 
You'll design the pipelines that ingest data from dozens of enterprise formats. You'll build the 
systems that diff millions of records in seconds. You'll create the infrastructure that lets our 
data scientists focus on algorithms instead of wrestling with data quality. 
When a Fortune 500 company validates their payroll migration, your infrastructure makes 
that possible. When our ML models predict configuration failures, they're running on 
pipelines you built. 
This is already happening at Opkey. You'll help us scale it to the world. 

What You'll Do 
You'll join a team that's already built production data infrastructure handling enterprise-scale 
workloads. Your job is to make it faster, more reliable, and ready for the next order of 
magnitude: 
• Build & Optimize Data Pipelines: Design and implement ETL/ELT pipelines that 
ingest data from diverse enterprise sources—Excel files, CSVs, API exports, 
database extracts, proprietary formats—and transform it into clean, queryable 
structures. 
• Design High-Performance Comparison Engines: Build systems that diff massive 
datasets—payroll files with millions of records, configuration exports with thousands 
of parameters—and surface differences in real-time. 
• Architect Scalable Data Storage: Design and manage data warehouses, data 
lakes, and databases that handle terabytes of enterprise data. Make decisions about 
partitioning, indexing, and storage formats. 
• Ensure Data Quality & Reliability: Implement validation, monitoring, and alerting 
systems that catch data issues before they affect downstream consumers. Build self
healing, observable pipelines. 
• Enable Analytics & ML Teams: Partner with data scientists to build the 
infrastructure they need—feature stores, training data pipelines, model serving 
infrastructure. 
• Scale for Growth: Design systems that can handle 10x the data without 10x the cost 
or complexity. Think ahead about bottlenecks and architect around them. 

Skills & Qualifications 
Required Technical Skills 
• Python for Data Engineering: 4+ years of production experience writing clean, 
maintainable, performant Python code for data processing and pipeline development 
• SQL Mastery: Expert-level SQL—complex queries, query optimization, 
understanding execution plans. You can look at a slow query and know how to fix it. 
• Data Pipeline Development: Hands-on experience building ETL/ELT pipelines that 
run reliably in production. You've designed pipelines that process millions of records 
without failing. 
• Distributed Computing: Deep knowledge of frameworks like Apache Spark for 
large-scale data processing. You understand partitioning strategies, shuffle 
optimization, and memory management. 
• Data Modeling & Warehousing: Strong foundation in data modeling—star 
schemas, slowly changing dimensions, normalization vs. denormalization tradeoffs. 
• Database Technologies: Experience with relational databases (PostgreSQL, 
MySQL) and data warehouses (Redshift, Snowflake, BigQuery). You know when to 
use each. 

Nice to Have 
• Experience with streaming data systems (Kafka, Kinesis) 
• Cloud platform expertise (AWS, GCP, Azure) 
• Knowledge of orchestration tools (Airflow, Dagster, Prefect) 
• Background in data comparison/diffing algorithms 
• Experience with containerization (Docker, Kubernetes) 
• Exposure to enterprise data formats and systems 

Mindset & Approach 
• Reliability-Obsessed: You've been paged at 2am, and you've built systems that 
don't page you at 2am. You understand what it takes to run production infrastructure. 
• Systems Thinker: You see how individual components fit into the larger 
architecture. You make tradeoffs that optimize for the whole system. 
• Ownership Mentality: You don't treat data quality as someone else's problem. You 
own the pipeline end-to-end—from ingestion to the data scientist's query. 
• Pragmatic Engineer: You know when to build for flexibility and when to optimize for 
performance. You don't chase shiny tools when proven ones work better. 
• Founder Mentality: You thrive in ambiguity, make architectural decisions with 
incomplete information, and care about outcomes over perfect documentation. 

What We're NOT Looking For 
• Engineers who only want to work with cutting-edge tools regardless of fit 
• People who treat data quality as someone else's problem 
• Those who need a detailed roadmap handed to them 
• Candidates who've never owned production systems end-to-end 

What We Offer 
• Competitive salary + meaningful equity in a company that's already winning 
• The chance to architect data infrastructure that Fortune 500 companies depend on 
• A team that values speed, ownership, and results over politics 
• Direct impact—your pipelines will process enterprise data at a scale most engineers 
never see 
• The opportunity to be part of history—building the data foundation that powers 
how enterprises manage their most critical applications 

We've proven our infrastructure works. Now we need someone to scale it to 
the world. 
Apply with your resume and a brief note about the most challenging data pipeline you've 
built. 
Opkey is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive 
environment for all employees. 

Apply now
Share this job