ETL Databricks (PySpark/SQL) - (CREQ224125)
Develop and maintain a metadata driven generic ETL framework for automating ETL code
Design, build, and optimize ETL/ELT pipelines using Databricks (PySpark/SQL) on AWS .
InsureMO rating engine experience required.
Ingest data from a variety of structured and unstructured sources (APIs, RDBMS, flat files, streaming).
Develop and maintain robust data pipelines for batch and streaming data using Delta Lake and Spark Structured Streaming.
Implement data quality checks, validations, and logging mechanisms.
Optimize pipeline performance, cost, and reliability.
Collaborate with data analysts, BI, and business teams to deliver fit for purpose datasets.
Support data modeling efforts (star, snowflake schemas) de norm tables approach and assist with data warehousing initiatives.
Work with orchestration tools Databricks Workflows to schedule and monitor pipelines.
Follow best practices for version control, CI/CD, and collaborative development
Skills
Hands-on experience in ETL/Data Engineering roles.
Strong expertise in Databricks (PySpark, SQL, Delta Lake), Databricks Data Engineer Certification preferred
Experience with Spark optimization, partitioning, caching, and handling large-scale datasets.
Proficiency in SQL and scripting in Python or Scala.
Solid understanding of data lakehouse/medallion architectures and modern data platforms.
Experience working with cloud storage systems like AWS S3
Familiarity with DevOps practices Git, CI/CD, Terraform, etc.
Strong debugging, troubleshooting, and performance-tuning skills.
: IN-TN-Chennai
: Full Time
: Individual Contributor
: Experienced
: No
: 20/06/2025, 4:09:07 PM
Read Full Description