Senior Data Engineer (AI,ML)

Oracle

Full-time

On-site

Mexico

Senior Jobs

Description

Essential Skills

Proficiency in Python (Java a plus) with hands-on experience in modern ML frameworks such as PyTorch and TensorFlow, plus a solid foundation in statistics and data modeling.
Experience building end-to-end ML and GenAI pipelines, including data preprocessing, feature engineering, model training, validation, and production deployment.
Practical expertise in Generative AI and RAG systems, including embeddings, chunking strategies, hybrid retrieval, reranking, and evaluation techniques.
Hands-on experience with agentic AI workflows, including prompt engineering, intent routing, tool orchestration, function calling, and safe tool-use with guardrails.
Experience with enterprise software development and cloud-native architectures, including REST APIs, microservices, containerization, CI/CD, and platforms such as AWS, Azure, GCP, or Oracle Cloud.
Strong problem-solving skills, with the ability to translate business requirements into scalable, reliable, and cost-effective AI solutions.
Excellent written and verbal communication skills, with the ability to work effectively in a collaborative, cross-functional, and global team environment.

Responsibilities

AI/ML

Design, train, and optimize machine learning models for real-world applications.
Build end-to-end ML pipelines, including data preprocessing, feature engineering, model training, validation, and deployment.
Collaborate with data engineers and software developers to integrate ML models into production systems.
Monitor model performance, detect data drift, and retrain models for continuous improvement.

GenAI

Agentic Solution Design & Orchestration
- Architect LLM-powered applications, including intent routing across tools and skills.
- Implement agentic workflows using frameworks such as LangGraph or equivalents; decompose tasks, manage tool invocation, and ensure determinism and guardrails.
- Integrate MCP-compatible tools and services to extend system capabilities.
Retrieval & Embeddings
- Build effective RAG systems: chunking strategies, embedding model selection, vector indexing, reranking, and grounding to authoritative data.
- Optimize vector stores and search using ANN, hybrid retrieval, filters, and metadata schemas.
Prompting & Model Strategy
- Develop robust prompting patterns and templates; structure prompts for tool use and function calling.
- Compare generic vs. fine-tuned LLMs for intent routing; make data-driven choices on cost, latency, accuracy, and maintainability.
Data & Integrations
- Implement NL2SQL (and guarded SQL execution) patterns; connect to microservices and enterprise systems via secure APIs.
- Define and enforce data schemas, metadata, and lineage for reliable retrieval.
Production Readiness
- Establish evaluation datasets and automated regressions for RAG and agent systems.
- Monitor quality (precision/recall, hallucination rate), latency, cost, and safety.
- Apply guardrails, PII handling, access controls, and policy enforcement end-to-end.

MLOps / LangOps

Version prompts, models, embeddings, and pipelines; manage A/B tests and rollout strategies.
Instrument tracing and telemetry for agent steps and tool calls; implement fallback, timeout, and retry policies.

Core Qualifications

Programming:
- Strong proficiency in Python (NumPy, Pandas, Scikit-learn); experience with ML frameworks such as TensorFlow and PyTorch.
Machine Learning & Deep Learning
- Hands-on experience with supervised, unsupervised, and reinforcement learning techniques.
Mathematics & Statistics
- Solid foundation in linear algebra, probability, optimization, and statistical modeling.
Data Handling
- Experience with SQL and NoSQL databases, data preprocessing, and feature engineering.
GenAI Expertise
- Strong understanding of vector embeddings and similarity search (cosine, inner product, L2), chunking strategies, and reranking.
- Hands-on experience building RAG pipelines (indexing, metadata, hybrid search, evaluators).
- Practical prompt engineering for tool use, function calling, and agent planning.
- Experience with agentic frameworks (e.g., LangGraph or similar) and orchestration of tools and services; familiarity with MCP and tool-integration patterns.
- Knowledge of NL2SQL techniques, SQL safety (schema constraints, query sandboxes), and microservice integration.
- Ability to evaluate tradeoffs between generic/base LLMs and fine-tuned/task-specific models (accuracy, drift, data/ops burden, latency, and cost).
- Proficiency with Python and common LLM/RAG libraries; containerization and CI/CD.
- Understanding of enterprise security, privacy, and compliance; RBAC/ABAC for data access, logging, and auditability.

MLOps & Deployment

Familiarity with model deployment frameworks (MLflow, Kubeflow, SageMaker, Vertex AI), CI/CD pipelines, and containerization using Docker and Kubernetes.

Preferred Experience

Hands-on experience with at least one major cloud provider (AWS, Azure, GCP, OCI).
Experience with large-scale distributed systems and big data frameworks (Spark, Hadoop).
Retrieval optimization using hybrid lexical + vector search, metadata filtering, and learned rerankers.
Model fine-tuning and adapter methods (LoRA, SFT, DPO) and evaluation.
Observability stacks for LLM applications (tracing, evaluation dashboards, cost/latency SLOs).
Document AI (OCR, layout parsing) and schema construction for unstructured data.
Caching, batching, and KV-cache optimization for throughput and cost efficiency.
Safe tool-use patterns, including constrained decoding, JSON schemas, and policy checks.

How We’ll Assess

Portfolio or walkthrough of a production RAG or agent system: objectives, architecture, evaluations, and outcomes.
Hands-on exercise: design an intent router, justify model choice (generic vs. fine-tuned), propose chunking and metadata strategy, and define evaluation metrics.
Discussion of failure modes (hallucinations, tool errors, SQL risk) and mitigation strategies.
Approach to governance: access controls, PII handling, audit logging, and red-teaming.

More jobs