MUST BE BASED IN NYC - No Relocation
Cannot Sponsor
At Hatch, we’re building AI that doesn’t just assist behind the scenes; it converses with customers out in the wild. Backed by Y Combinator and top-tier investors like Bessemer and NextView, we’re scaling fast, doubling revenue year over year, and looking for A players to help us cement our place as the category leader in AI for customer engagement.
Design and build scalable batch and real-time data pipelines using Kinesis, Pub/Sub, Flink, Spark, Airflow, and dbt.
Architect and implement multi-tier data lake architectures with raw/staging/curated layers, defining promotion criteria, data quality gates, and consumption patterns.
Develop and maintain production-quality APIs, SDKs, and backend services that integrate with data infrastructure.
Apply software engineering best practices—modular design, design patterns, testing, CI/CD, observability, and code reviews—to all data platform work.
Model and optimize datasets in BigQuery and Aurora PostgreSQL with attention to performance, cost, and governance.
Collaborate with backend teams to define data contracts, streaming interfaces, and service boundaries.
Implement infrastructure-as-code (Terraform, Docker, Kubernetes/EKS) for deployment automation.
Establish and monitor SLOs for data quality, latency, and availability; troubleshoot production issues across distributed systems.
3+ years building production APIs, SDKs, or backend services in Python, Go, or similar languages.
Demonstrated expertise with software design patterns (repository, factory, dependency injection, etc.) applied in real production systems—not theoretical knowledge.
Proven ability to write clean, tested, maintainable code with proper abstractions and error handling.
Experience with code reviews, CI/CD pipelines, and production deployments.
Strong computer science fundamentals: data structures, algorithms, concurrency, distributed systems.
Must-have data engineering experience:
5+ years total engineering experience, with 2+ years focused on data engineering.
Hands-on expertise with distributed data technologies: Kafka/Kinesis/PubSub, Spark/Flink, Airflow, dbt, BigQuery.
Experience with modern data lake table formats like Apache Iceberg, Delta Lake, or Apache Hudi for advanced schema management and data lake optimization.
Experience designing and implementing layered data architectures (raw/landing → refined/standardized → curated/consumption) with appropriate transformations and quality checks at each stage.
Strong SQL skills and experience with data modeling (dimensional, event-driven, domain patterns) and query optimization.
Production experience building both batch and streaming data pipelines.
Working knowledge of AWS and GCP, including monitoring/troubleshooting (CloudWatch, Prometheus/Grafana).
Familiarity with containerization, Kubernetes/EKS, and infrastructure-as-code (Terraform).
Exposure to event-driven microservices and schema governance (parquet/protobuf/Avro).
Excellent communication skills—can explain complex systems clearly and collaborate effectively with engineering teams.
Nice to Have
Experience with ML/LLM pipelines in production (vector databases, feature stores, prompt orchestration).
Open-source contributions or work in fast-moving startup environments.
Competitive salary and equity
Remote (Eastern or Central Time Zone required) OR Hybrid work environment (3 days/week in our NYC office)
Medical, dental, and vision benefits
401(k) plan
Flexible PTO
Opportunity to build at the ground floor of a high-growth, mission-driven company
Not offering sponsorship
Shape the future of AI-driven customer service
Build alongside founders and leaders who value speed, ownership, and ambition
Solve hard problems that impact real businesses and customers
Join a team of builders who care about great engineering, fast execution, and each other