We are seeking a Data Engineer to design, develop, and optimize scalable data pipelines supporting advanced analytics and machine learning solutions in a cloud-based environment. The ideal candidate has hands-on experience with Azure Data Services and Databricks, a strong background in data pipeline orchestration, proven expertise in data quality management and process automation, and experience in Procurement or Supply Chain.
Key Responsibilities:
1. Data Pipeline Architecture & Development:
- Design, develop, and maintain robust ETL/ELT pipelines to handle large-scale data ingestion, transformation, and integration.
- Build and optimize data workflows using Azure Data Factory, Databricks (PySpark, Spark SQL), and Azure Synapse Analytics.
- Ensure pipeline scalability, fault tolerance, and efficiency across diverse data sources, primarily structured (tabular) datasets.
- Implement incremental loads, change data capture (CDC), and other advanced data ingestion strategies.
2. Automation & Process Optimization:
- Develop and maintain automated data pipelines with a focus on performance optimization and cost-efficiency in the Azure environment.
- Implement CI/CD pipelines for seamless deployment of data solutions, leveraging DevOps tools and Databricks Workflows.
- Collaborate with cloud architects to optimize resource usage and adhere to cloud governance best practices.
3. Data Management & Quality Assurance:
- Lead the design and implementation of data quality frameworks to ensure data integrity, consistency, and compliance across systems.
- Develop monitoring solutions for pipeline health, data freshness, and anomaly detection.
- Maintain comprehensive documentation covering data models, transformation logic, and operational procedures.
4. Cross-functional Collaboration & Stakeholder Engagement:
- Partner with Data Scientists, Analysts, and Business Stakeholders to understand data needs and translate them into effective solutions.
- Facilitate integration of machine learning models into production data pipelines.
- Provide technical mentorship to junior data engineers and contribute to team knowledge-sharing initiatives.
Required Skills & Qualifications:
Education: Bachelor’s degree in Computer Science, Data Engineering, Analytics, Statistics, Mathematics, or a related field. (Master’s degree is a plus.)
Experience:
- 3+ years of hands-on experience in data engineering or a related discipline.
- Proven experience designing and deploying end-to-end data pipelines in Azure and Databricks environments.
Language: Proficiency in English (written and spoken) is required, with strong English skills being prioritized.
Technical Skills:
Programming & Data Processing:
- Advanced proficiency in SQL and Python for data manipulation, transformation, and analysis.
- Extensive experience with PySpark and Spark SQL for big data processing in Databricks.
Cloud & Data Services (Azure):
- In-depth knowledge of Azure services, including:
- Azure Data Factory (ADF) for pipeline orchestration
- Azure Data Lake Storage (ADLS) for data storage and management
- Azure SQL Database for relational data management
- Experience with Azure Functions and event-driven architectures is a plus
Automation & DevOps:
- Hands-on experience implementing CI/CD pipelines using tools like Azure DevOps, GitHub Actions, or similar.
- Familiarity with infrastructure-as-code (IaC) tools such as Terraform or ARM templates.
- Experience with Databricks Workflows and job orchestration tools.
Data Management & Warehousing:
- Strong understanding of data lakehouse architectures and data warehousing solutions (e.g., SQL Server, Redshift, BigQuery).
- Experience designing and maintaining data models and schema designs for analytical use cases.
- Familiarity with data governance, security best practices, and compliance standards.
Machine Learning Integration (Preferred):
- Experience supporting machine learning workflows and integrating models into production pipelines.
- Understanding of MLOps practices is a plus.
Preferred Qualifications:
- Experience with real-time data processing (e.g., Apache Kafka, Azure Stream Analytics).
- Familiarity with Power BI data connections and reporting structures.
- Hands-on experience with Databricks Workflows for complex pipeline orchestration.
Read Full Description