[What the role is]
The role of the Data Engineer is to collaborate with the existing team of data scientists, data engineers and analysts to create data tools, develop data ingestion and processing pipelines, ensuring optimized data processing, and ensuring that data systems meet STB's business requirements. The role requires working closely with the data science team to set up and deploying data pipelines to support machine learning models and analytics scripts, developing data integrations, assembling complex datasets and implementing process improvements. The Data Engineer plays a key role in enhancing data reliability and quality while ensuring scalable business processes and supporting the team's data-related initiatives.
[What you will be working on]
1. Project Management
a) Project manage and work closely with vendors and internal stakeholders to deliver on data engineering related implementations ensuring that deliverables and objectives are met within agreed scope and timelines.
b) Collaborate with cross-functional teams, including data scientists, data engineers, DevOps engineers, product managers, business analysts and business stakeholders, to integrate and deploy models into current analytics platforms and production systems.
c) Plan, execute and monitor project milestones and ensure timely update to management on project progress and issues.
2. Application of Engineering Disciplines in Support of Strategic Business Objectives
a) Prepare, process, cleanse and verify the integrity of data collected for analysis.
b) Design, develop and implement self-managed data processing and compilation pipelines related to key enterprise data domains so that data compilation business logic can be managed and maintained in-house to retain agility in responding to changing operational needs.
c) To review the design and implementation of data pipelines developed by the vendor to ensure that they meet the operational requirements of STB’s business and are integrated back to the self-managed data compilation pipelines for a seamless data processing and compilation process.
d) Work closely with vendors and internal stakeholders to project manage and coordinate Data Science & Analytics's (DS&A) data ingestion and data processing pipelines across platforms which can include mobile apps, SaaS platforms, on-premise and partner systems
e) Help architect DS&A’s data integrations and data processing flows between external / 3rd party data sources, AWS Cloud datawarehouses (e.g. Redshift, RDS) and internal on-premise systems for workloads at scale
f) Provide guidance to internal teams on best practices for Cloud data integrations
g) Identify, design and implement internal process improvements: automating manual processes, optimising data delivery, re-designing infrastructure for greater scalability, etc.
h) Develop monitoring toolkits to ensure that integration is executed successfully and alerts where integrations have failed
i) Implement best practice DataOps processes to ensure continuous integration, deployment and governance of our data pipelines across the entire data lifecycle from data preparation to reporting.
3. Data Integration and Data Management
a) Collaborate with current team to review the existing data integration processes and make improvements to the current data processing pipelines.
b) Work with data and agency partners to assemble large, complex datasets that meet functional and non-functional business requirements.
c) Provide inputs to the design and development of an integrated data model to allow analysis across multiple structured and unstructured datasets.
d) Recommend different ways to constantly improve data reliability and quality, including helping review and enhance the existing data collection procedures to include data for building analytics models relevant for industry transformation
e) Analyse and assess the effectiveness and accuracy of data sources (e.g., datasets received from stakeholders) and ensure that they meet STB's Data Quality standards.
[What we are looking for]