Company Overview
Citco is a global leader in financial services, delivering innovative solutions to some of the world’s largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Tech Lead – Data Engineering with extensive Databricks expertise and AWS experience to lead mission-critical data initiatives
Role Summary
As the Tech Lead – Data Engineering, you will be responsible for architecting, implementing, and optimizing end-to-end data solutions on Databricks (Spark, Delta Lake, MLflow, etc.) while integrating with core AWS services (S3, Glue, Lambda, etc.). You will lead a technical team of data engineers, ensuring best practices in performance, security, and scalability. This role requires a deep, hands-on understanding of Databricks internals and a track record of delivering large-scale data platforms in a cloud environment.
Key Responsibilities
- Databricks Platform & Architecture
- Architect and maintain Databricks Lakehouse solutions using Delta Lake for ACID transactions and efficient data versioning.
- Leverage Databricks SQL Analytics for interactive querying and report generation.
- Manage cluster lifecycle (provisioning, sizing, scaling) and optimize Spark jobs for cost and performance.
- Implement structured streaming pipelines for near real-time data ingestion and processing.
- Configure and administer Databricks Repos, notebooks, and job scheduling/orchestration to streamline development workflows.
- AWS Cloud Integration
- Integrate Databricks with AWS S3 as the primary data lake storage layer.
- Design and implement ETL/ELT pipelines using AWS Glue catalog, AWS Lambda, and AWS Step Functions where needed.
- Ensure proper networking configuration (VPC, security groups, private links) for secure and compliant data access.
- Automate infrastructure deployment and scaling using AWS CloudFormation or Terraform.
- Data Pipeline & Workflow Management
- Develop and maintain scalable, reusable ETL frameworks using Spark (Python/Scala).
- Orchestrate complex workflows, applying CI/CD principles (Git-based version control, automated testing).
- Implement Delta Live Tables or similar frameworks to handle real-time data ingestion and transformations.
- Integrate with MLflow (if applicable) for experiment tracking and model versioning, ensuring data lineage and reproducibility.
- Performance Tuning & Optimization
- Conduct advanced Spark job tuning (caching strategies, shuffle partitions, broadcast joins, memory optimization).
- Fine-tune Databricks clusters (autoscaling policies, instance types) to manage cost without compromising performance.
- Optimize I/O performance and concurrency for large-scale data sets.
- Security & Governance
- Implement Unity Catalog or equivalent Databricks features for centralized governance, access control, and data lineage.
- Ensure compliance with industry standards (e.g., GDPR, SOC, ISO) and internal security policies.
- Apply IAM best practices across Databricks and AWS to enforce least-privilege access.
- Technical Leadership & Mentorship
- Lead and mentor a team of data engineers, conducting code reviews, design reviews, and knowledge-sharing sessions.
- Champion Agile or Scrum development practices, coordinating sprints and deliverables.
- Serve as a primary technical liaison, working closely with product managers, data scientists, DevOps, and external stakeholders.
- Monitoring & Reliability
- Configure observability solutions (e.g., Datadog, CloudWatch, Prometheus) to proactively identify performance bottlenecks.
- Set up alerting mechanisms for latency, cost overruns, and cluster health.
- Maintain SLAs and KPIs for data pipelines, ensuring robust data quality and reliability.
- Innovation & Continuous Improvement
- Stay updated on Databricks roadmap and emerging data engineering trends (e.g., Photon, Lakehouse features).
- Evaluate new tools and technologies, driving POCs to improve data platform capabilities.
- Collaborate with business units to identify data-driven opportunities and craft solutions that align with strategic goals.
Qualifications
- Educational Background
- Bachelor’s or Master’s degree in Computer Science, Data Science, Engineering, or equivalent experience.
- Technical Experience
- Databricks Expertise: 5+ years of hands-on Databricks (Spark) experience, with a focus on building and maintaining production-grade pipelines.
- AWS Services: Proven track record with AWS S3, EC2, Glue, EMR, Lambda, Step Functions, and security best practices (IAM, VPC).
- Programming Languages: Strong proficiency in Python (PySpark) or Scala; SQL for analytics and data modeling.
- Data Warehousing & Modeling: Familiarity with RDBMS (e.g., Postgres, Redshift) and dimensional modeling techniques.
- Infrastructure as Code: Hands-on experience using Terraform or AWS CloudFormation to manage cloud infrastructure.
- Version Control & CI/CD: Git-based workflows (GitHub/GitLab), Jenkins or similar CI/CD tools for automated builds and deployments.
- Leadership & Soft Skills
- Demonstrated experience leading a team of data engineers in a complex, high-traffic data environment.
- Outstanding communication and stakeholder management skills, with the ability to translate technical jargon into business insights.
- Adept at problem-solving, with a track record of quickly diagnosing and resolving data performance issues.
- Certifications (Preferred)
- Databricks Certified Associate/Professional (e.g., Databricks Certified Professional Data Engineer).
- AWS Solutions Architect (Associate or Professional).
- © MapTiler © OpenStreetMap contributors © Oracle CorporationTermsLegal Notices
- Use control and scroll to zoom the map
- Use two fingers to move the map
- Custom House Plaza Block 6, Dublin 1, IE
- Copy to Clipboard
Read Full Description