DescriptionWhat You’ll Be Doing (Core Mission)
- Design and implement large-scale, fault-tolerant data pipelines on OCI, using services like OCI Data Integration, OCI Data Flow (Apache Spark), Object Storage, and Autonomous Database.
- Build and manage streaming data architectures using tools such as OCI GoldenGate, Apache Kafka, and Spark/Flink Streaming.
- Enforce standards and automation across the entire data lifecycle, including schema evolution, dataset migration, and deprecation strategies.
- Improve platform resilience, data quality, and observability with advanced monitoring, alerting, and automated data governance.
- Serve as a technical leader, mentoring junior engineers, reviewing designs and code, and promoting engineering best practices.
- Collaborate cross-functionally with ML engineers, platform teams, and data scientists to integrate data services with AI/ML workloads.
- Partner in AI pipeline enablement, ensuring Lakehouse services efficiently support model training, feature engineering, and real-time inference.
Required Technical Skills & Experience
Engineering & Infrastructure
- 5+ years building distributed systems or production-grade data platforms in the cloud.
- Strong coding proficiency in Python, Java, or Scala, with an emphasis on performance and reliability.
- Expertise in SQL and PLSQL, data modeling, and query optimization.
- Proven experience with cloud-native architectures—especially OCI, AWS, Azure, or GCP.
Lakehouse & Streaming Mastery
- Deep knowledge of modern lakehouse/table formats: Apache Iceberg, Delta Lake, or Apache Hudi.
- Production experience with big data compute engines: Spark, Flink, or Trino.
- Skilled in real-time streaming and event-driven architectures using Kafka, Flink, GoldenGate, or Streaming.
- Experience managing data lakes, catalogs, and metadata governance in large-scale environments.
AI/ML Integration
- Hands-on experience enabling ML pipelines: from data ingestion to model training and deployment.
- Familiarity with ML frameworks (e.g., PyTorch, XGBoost, scikit-learn).
- Understanding of modern ML architectures: including RAG, prompt chaining, and agent-based workflows.
- Awareness of MLOps practices, including model versioning, feature stores, and integration with AI pipelines.
DevOps & Operational Excellence
- Deep understanding of CI/CD, infrastructure-as-code (IaC), and release automation using tools like Terraform, GitHub Actions, or CloudFormation.
- Experience with Docker, Kubernetes, and cloud-native container orchestration.
- Strong focus on testing, documentation, and system observability (Prometheus, Grafana, ELK stack).
- Comfortable with cost/performance tuning, incident response, and data security standards (IAM, encryption, auditing).
Preferred Qualifications
- Experience with Oracle’s cloud-native tools: OCI Data Integration, Data Flow, Autonomous Database, GoldenGate, OCI Streaming.
- Experience with query engines like Trino or Presto, and tools like dbt or Apache Airflow.
- Familiarity with data cataloging, RBAC/ABAC, and enterprise data governance frameworks.
- Exposure to vector databases and LLM tooling (embeddings, vector search, prompt orchestration).
- Solid understanding of data warehouse design principles, star/snowflake schemas, and ETL optimization.
Minimum Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or related technical field.
- 4–6 year’s experience designing and building cloud-based data pipelines and distributed systems.
- Proficiency in at least one core language: Python, Java, or Scala.
- Familiar with lakehouse formats (Iceberg, Delta, Hudi), file formats (Parquet, ORC, Avro), and streaming platforms (Kafka, Kinesis).
- Strong understanding of distributed systems fundamentals: partitioning, replication, idempotency, consensus protocols.
Soft Skills & Team Expectations
- Proven ability to lead technical initiatives independently end-to-end.
- Comfortable working in cross-functional teams and mentoring junior engineers.
- Excellent problem-solving skills, design thinking, and attention to operational excellence.
- Passion for learning emerging data and AI technologies and sharing knowledge across teams.
QualificationsCareer Level - IC3