Tavus logo

Senior Data Engineer

Tavus
Full-time
On-site
San Francisco, California, United States
$150,000 - $200,000 USD yearly
Senior Jobs

About Us

Tavus is a research lab pioneering human computing. We’re building AI Humans: a new interface that closes the gap between people and machines, free from the friction of today’s systems. Our real-time human simulation models let machines see, hear, respond, and even look real—enabling meaningful, face-to-face conversations. AI Humans combine the emotional intelligence of humans with the reach and reliability of machines, making them capable, trusted agents available 24/7, in every language, on our terms.

Imagine a therapist anyone can afford. A personal trainer that adapts to your schedule. A fleet of medical assistants that can give every patient the attention they need. With Tavus, individuals, enterprises, and developers can all build AI Humans to connect, understand, and act with empathy at scale.

We’re a Series A company backed by world-class investors including Sequoia Capital, Y Combinator, and Scale Venture Partners.

Be part of shaping a future where humans and machines truly understand each other.

The Role

Data is the foundation of everything we build. We’re looking for a Senior Data Engineer who goes beyond pipelines and cleaning datasets. You’ll own our entire data strategy, from sourcing and curating to structuring and optimizing, ensuring our models and products are powered by the highest-quality data possible. You’re a true master of your craft including data sourcing, formatting, labeling, cleaning, and making use of our internal data. 

Your Mission 🚀

  • Be a data guru – You anticipate the data needs not just for today, but for the future. You know how to curate diverse, high-quality datasets to ensure AI models reach their full potential.

  • Influence AI model training – Your data work will directly impact AI model performance, efficiency, and inference accuracy. You will collaborate closely with ML engineers to optimize datasets for maximum AI effectiveness.

  • Own, build and scale the data pipeline. You will be highly involved in data sourcing, and expand and own the curation, filtering and preprocessing pipelines across a variety of data modalities.

  • Be a data hunter – Web scraping, third-party deals, unconventional sources—you’ll find, collect, and curate the best multimodal data (text, video, images) to power our models. Manage large-scale data procurement to ensure our models train on the highest quality information.

  • Be a video data craftsman - we’re building something truly unique based on a blend of video and audio data. Throwing data at the problem is not a solution here, but you should be up for the challenge of making it work! You will own this challenge and ensure that our video and audio datasets are structured for AI success. You will help us truly flesh out the capabilities of our SOTA models!

  • Optimize labeling & automation – You will own the data labeling process and build automated workflows to make cleaning, labeling, and structuring data as efficient as possible. Work closely with our data annotation teams to ensure high-quality labeled data for ML models.

  • Turn internal data into gold – Our own platform is a goldmine of insights—help us unlock and use it to drive smarter decisions and supercharge growth.

  • Speed + precision – Move fast, but don’t break data. Every pipeline, dataset, and workflow should be tight, efficient, and built to last.

What We’re Looking For 🔥

  • You don’t just maintain - you build. From zero to fully running pipelines, you make things happen. You can take charge of how we use internal data to make smarter decisions.

  • Extreme ownership - You own data strategy end-to-end, proactively solving what data we need, where to get it, and how to structure it for AI impact.

  • Strategic mindset – You think beyond pipelines—you anticipate data needs before they arise and help shape AI development at Tavus.

  • Previous work with LLMs, multimodal data, is a big plus. You know how to source, structure, and optimize data for real AI impact.

  • Automation expert – You know how to automate data cleaning, structuring, and labeling workflows for efficiency and scale.

  • ML-first mindset – You understand that better data = better models and structure datasets to maximize AI model accuracy.

  • Fast, but flawless. Speed matters, but so does accuracy. You balance both.

  • You don’t follow best practices—you create them. A lot of what we’re doing is new- you set the standard for how data should be done.

  • Technical expertise – You have strong experience with Python, SQL, and large-scale data processing tools.

Bonus Points if:

  • You have some previous work with LLMs, multimodal data. You know how to source, structure, and optimize data for real AI impact.

  • You have experience with in-house video data collection and relevant studio setups. You know best practices for multimodal video and audio data collection.

 

Benefits & Culture

When you join Tavus, you’re joining a diverse and supportive team. Our work is driven by our people, and our success is shared by all. This position has a flexible work schedule, unlimited PTO, competitive healthcare, and gear stipends, as well as plenty of fun. At the end of the day, we want Tavus to be a place for you to learn, directly drive impact, and work with a team you love.

To learn more about our team culture and benefits, check out our hiring page.

Tavus is growing fast, and we’d like you to grow with us. If you’re excited to get your hands dirty and help make machines more human, drop your resume and we’ll be in touch.

We are not looking for cultural fits, we are looking for culture creators. Diversity is what drives our success – it’s at the core of how we hire, communicate, and work. We are inclusive to all and combine our diverse backgrounds, skill sets, and perspectives to build the best experiences for our clients.