Booth and Partners logo

Senior Data Engineer

Booth and Partners
Full-time
Remote

Role Summary<\/span><\/b>
<\/p>

We are seeking a Senior Data Engineer<\/b> to design and build the ETL pipelines and AI -powered document intelligence systems<\/b> that ingest sustainability audit evidence, extract structured data, and feed scoring, analytics, and reporting workflows.<\/span>
<\/p>

This role focuses on execution<\/b>: building reliable pipelines that combine manual inputs with automated extraction from unstructured documents<\/b> (certifications, test reports, utility bills, declarations, and supply chain records), transforming them into structured, validated data used in scoring algorithms and downstream analytics.<\/span>
<\/p>

You will work closely with the Data Architect, sustainability experts, and analytics teams to reduce manual effort, improve data quality, and scale the platform.<\/span>
<\/p>

Key Responsibilities<\/span><\/b>
1. ETL / ELT Pipeline Development<\/b><\/span><\/p>

  • Build and maintain ETL/ELT pipelines to:<\/span>
    <\/li>
    • Ingest audit metadata and document links (Dropbox, Google Drive)<\/span>
      <\/li>
    • Capture manual friendly -value inputs<\/span>
      <\/li>
    • Integrate AI -extracted fields into structured tables<\/span>
      <\/li><\/ul>
    • Implement:<\/span>
      <\/li>
      • Incremental loads<\/span>
        <\/li>
      • Validation checks<\/span>
        <\/li>
      • Error handling and retries<\/span>
        <\/li>
      • Monitoring and alerting<\/span>
        <\/li><\/ul><\/ul>

        2. AI -Driven Document Extraction<\/span><\/b>
        <\/p>

        • Implement document intelligence pipelines using:<\/span>
          <\/li>
          • OCR (e.g., cloud OCR services)<\/span>
            <\/li>
          • NLP / LLM -based extraction<\/span>
            <\/li>
          • Classification and field mapping<\/span>
            <\/li><\/ul>
          • Extract structured attributes such as:<\/span>
            <\/li>
            • Certification names and validity dates<\/span>
              <\/li>
            • Test results and thresholds<\/span>
              <\/li>
            • Utility consumption values<\/span>
              <\/li>
            • Supplier declarations and claims<\/span>
              <\/li><\/ul>
            • Assign confidence scores<\/b> and route low -confidence outputs for human review<\/span>
              <\/li><\/ul>

              3. Human -in -the -Loop & Feedback Systems<\/span><\/b>
              <\/p>

              • Design workflows where:<\/span>
                <\/li>
                • Auditors review, correct, or confirm AI -extracted values<\/span>
                  <\/li>
                • Corrections feed back into model improvement and rules tuning<\/span>
                  <\/li><\/ul>
                • Track provenance of each value (manual vs. AI -derived)<\/span>
                  <\/li><\/ul>

                  4. Scoring & Analytics Integration<\/span><\/b>
                  <\/p>

                  • Feed validated data into scoring algorithms<\/span>
                    <\/li>
                  • Optimize data models for:<\/span>
                    <\/li>
                    • Tableau dashboards<\/span>
                      <\/li>
                    • Product, category, and project -level analytics<\/span>
                      <\/li><\/ul>
                    • Ensure data freshness and consistency across reporting layers<\/span>
                      <\/li><\/ul>

                      5. Data Quality & Observability<\/span><\/b>
                      <\/p>

                      • Implement automated checks for:<\/span>
                        <\/li>
                        • Missing or inconsistent values<\/span>
                          <\/li>
                        • Schema drift<\/span>
                          <\/li>
                        • Anomalous scores<\/span>
                          <\/li><\/ul><\/ul>
                          Support reprocessing and backfills as scoring logic evolves.<\/span><\/span>
                          <\/div><\/span>

                          Requirements<\/h3>

                          Required Skills & Experience<\/span><\/b>
                          <\/p>

                          Core Data Engineering<\/span><\/b>
                          <\/p>

                          • Strong SQL and Python<\/span>
                            <\/li>
                          • Experience with ETL tools (dbt, Airflow, Dagster, Prefect, or similar)<\/span>
                            <\/li>
                          • Hands -on experience with Snowflake or equivalent platforms<\/span>
                            <\/li>
                          • Experience supporting BI tools (Tableau preferred)<\/span>
                            <\/li><\/ul>

                            AI & Document Intelligence<\/span><\/b>
                            <\/p>

                            • Practical experience with:<\/span>
                              <\/span><\/li><\/ul>
                                • OCR pipelines<\/span>
                                  <\/li>
                                • NLP / LLM -based data extraction<\/span>
                                  <\/li>
                                • Document classification and entity extraction<\/span>
                                  <\/li><\/ul>
                                • Familiarity with:<\/span>
                                  <\/li>
                                  • Prompt engineering for structured extraction<\/span>
                                    <\/li>
                                  • Confidence scoring and validation<\/span>
                                    <\/li>
                                  • Evaluation of extraction accuracy<\/span>
                                    <\/li><\/ul><\/ul>

                                    Systems & Workflow Design<\/span><\/b>
                                    <\/p>

                                    • Experience building pipelines that mix:<\/span>
                                      <\/li>
                                      • Manual inputs<\/span>
                                        <\/li>
                                      • Automated AI outputs<\/span>
                                        <\/li><\/ul>
                                      • Strong understanding of data lineage and traceability<\/span>
                                        <\/li>
                                      • Ability to build systems that auditors and analysts trust<\/span>
                                        <\/li><\/ul>

                                        Nice -to -Have<\/span><\/b>
                                        <\/p>

                                        • Experience with sustainability, ESG, or compliance data<\/span>
                                          <\/li>
                                        • Experience processing PDFs and scanned documents at scale<\/span>
                                          <\/li>
                                        • Familiarity with cloud AI services or open -source NLP frameworks<\/span>
                                          <\/li>
                                        • Experience building analytics -ready datasets from unstructured sources<\/span>
                                          <\/li><\/ul>

                                          Success Looks Like<\/span><\/b>
                                          <\/p>

                                          ·         <\/span><\/span><\/span><\/span><\/span>Significant reduction in manual audit effort through AI -assisted extraction<\/span>
                                          <\/p>

                                          ·         <\/span><\/span><\/span><\/span><\/span>High -confidence, traceable data feeding scoring and analytics<\/span>
                                          <\/p>

                                          ·         <\/span><\/span><\/span><\/span><\/span>Reliable, observable pipelines that scale with document volume<\/span>
                                          <\/p>

                                          ·         <\/span><\/span><\/span><\/span><\/span>Faster reporting and deeper insights for customers and internal teams<\/span>
                                          <\/p>


                                          <\/div><\/span>

                                          Benefits<\/h3>
                                          Competitive Salary:<\/span><\/span><\/span><\/i> <\/span><\/span><\/span>Get paid what you're worth!<\/span><\/span><\/span>
                                          <\/div>
                                          Prepaid Medicine:<\/span><\/span> <\/span><\/span><\/span><\/span><\/i>Your health is our priority!<\/span><\/span><\/span>
                                          <\/span><\/span>Life Insurance:<\/span><\/span> <\/span><\/span><\/span><\/span><\/i>Peace of mind for you and your loved ones!<\/span><\/span><\/span>
                                          <\/span><\/span>Birthday Day Off:<\/span><\/span> <\/span><\/span><\/span><\/span><\/i>Celebrate your special day YOUR way!<\/span><\/span><\/span>
                                          <\/span><\/span>Indefinite -Term Labor Contract:<\/span><\/span> <\/span><\/span><\/span><\/span><\/i>Stability and all legal benefits included!<\/span><\/span><\/span><\/div><\/span>