Job Title: Data Engineer – AI/ML Pipelines
Location: Seffner, FL
Work Model: Hybrid
Duration: CTH
Position Summary
The Data Engineer – AI/ML Pipelines plays a key role in designing, building, and maintaining scalable data infrastructure that powers analytics and machine learning initiatives. This position focuses on developing production-grade data pipelines that support end-to-end ML workflows—from data ingestion and transformation to feature engineering, model deployment, and monitoring.
The ideal candidate has hands-on experience working with operational systems such as Warehouse Management Systems (WMS) or ERP platforms, and is comfortable partnering closely with data scientists, ML engineers, and operational stakeholders to deliver high-quality, ML-ready datasets.
Key Responsibilities
ML-Focused Data Engineering
- Build, optimize, and maintain data pipelines specifically designed for machine learning workflows.
- Collaborate with data scientists to develop feature sets, implement data versioning, and support model training, evaluation, and retraining cycles.
- Participate in initiatives involving feature stores, model input validation, and monitoring of data quality feeding ML systems.
Data Integration from Operational Systems
- Ingest, normalize, and transform data from WMS, ERP, telemetry, and other operational data sources.
- Model and enhance operational datasets to support real-time analytics and predictive modeling use cases.
Pipeline Automation & Orchestration
- Build automated, reliable, and scalable pipelines using tools such as Azure Data Factory, Airflow, or Databricks Workflows.
- Ensure data availability, accuracy, and timeliness across both batch and streaming systems.
Data Governance & Quality
- Implement validation frameworks, anomaly detection, and reconciliation processes to ensure high-quality ML inputs.
- Support metadata management, lineage tracking, and documentation of governed, auditable data flows.
Cross-Functional Collaboration
- Work closely with data scientists, ML engineers, software engineers, and business teams to gather requirements and deliver ML-ready datasets.
- Translate modeling and analytics needs into efficient, scalable data architecture solutions.
Documentation & Mentorship
- Document data flows, data mappings, and pipeline logic in a clear, reproducible format.
- Provide guidance and mentorship to junior engineers and analysts on ML-focused data engineering best practices.
Required Qualifications
Technical Skills
- Strong experience building ML-focused data pipelines, including feature engineering and model lifecycle support.
- Proficiency in Python, SQL, and modern data transformation tools (dbt, Spark, Delta Lake, or similar).
- Solid understanding of orchestrators and cloud data platforms (Azure, Databricks, etc.).
- Familiarity with ML operations tools such as MLflow, TFX, or equivalent frameworks.
- Hands-on experience working with WMS or operational/logistics data.
Experience
- 5+ years in data engineering, with at least 2 years directly supporting AI/ML applications or teams.
- Experience designing and maintaining production-grade pipelines in cloud environments.
- Proven ability to collaborate with data scientists and translate ML requirements into scalable data solutions.
Education & Credentials
- Bachelor’s degree in Computer Science, Data Engineering, Data Science, or a related field (Master’s preferred).
- Relevant certifications are a plus (e.g., Azure AI Engineer, Databricks ML, Google Professional Data Engineer).
Preferred Qualifications
- Experience with real-time ingestion using Kafka, Kinesis, Event Hub, or similar.
- Exposure to MLOps practices and CI/CD for data pipelines.
- Background in logistics, warehousing, fulfillment, or similar operational domains.
Read Full Description