Position: Data Engineer (Databricks & AWS)
Company Overview Citco is a global leader in financial services, delivering innovative solutions to some of the world's largest institutional clients. We harness the power of data to drive operational efficiency and informed decision-making. We are looking for a Data Engineer with strong Databricks expertise and AWS experience to contribute to mission-critical data initiatives.
Role Summary as a Data Engineer, you will be responsible for developing and maintaining end-to-end data solutions on Databricks (Spark, Delta Lake, MLflow, etc.) while working with core AWS services (S3, Glue, Lambda, etc.). You will work within a technical team, implementing best practices in performance, security, and scalability. This role requires solid understanding of Databricks and experience with cloud-based data platforms.
Key Responsibilities
1.Databricks Platform & Development
- Implement Databricks Lakehouse solutions using Delta Lake for ACID transactions and data versioning
- Utilize Databricks SQL Analytics for querying and report generation
- Support cluster management and Spark job optimization
- Develop structured streaming pipelines for data ingestion and processing
- Use Databricks Repos, notebooks, and job scheduling for development workflows
2.AWS Cloud Integration
- Work with Databricks and AWS S3 integration for data lake storage
- Build ETL/ELT pipelines using AWS Glue catalog, AWS Lambda, and AWS Step Functions
- Configure networking settings for secure data access
- Support infrastructure deployment using AWS CloudFormation or Terraform
3.Data Pipeline & Workflow Development
- Create scalable ETL frameworks using Spark (Python/Scala)
- Participate in workflow orchestration and CI/CD implementation
- Develop Delta Live Tables for data ingestion and transformations
- Support MLflow integration for data lineage and reproducibility
4.Performance & Optimization
- Implement Spark job optimizations (caching, partitioning, joins)
- Support cluster configuration for optimal performance
- Optimize data processing for large-scale datasets
5.Security & Governance
- Apply Unity Catalog features for governance and access control
- Follow compliance requirements and security policies
- Implement IAM best practices
6.Team Collaboration
- Participate in code reviews and knowledge-sharing sessions
- Work within Agile/Scrum development framework
- Collaborate with team members and stakeholders
7.Monitoring & Maintenance
- Help implement monitoring solutions for pipeline performance
- Support alert system setup and maintenance
- Ensure data quality and reliability standards
Qualifications
1.Educational Background
- Bachelor's degree in Computer Science, Data Science, Engineering, or equivalent experience
2.Technical Experience
- Databricks Experience: 2+ years of hands-on Databricks (Spark) experience
- AWS Knowledge: Experience with AWS S3, Glue, Lambda, and basic security practices
- Programming Skills: Strong proficiency in Python (PySpark) and SQL
- Data Warehousing: Understanding of RDBMS and data modeling concepts
- Infrastructure: Familiarity with infrastructure as code concepts
Read Full Description