ETL Databricks (PySpark/SQL)

Virtusa

ETL Databricks (PySpark/SQL) - (CREQ224125)

Description

Develop and maintain a metadata driven generic ETL framework for automating ETL code

Design, build, and optimize ETL/ELT pipelines using Databricks (PySpark/SQL) on AWS .

InsureMO rating engine experience required.

Ingest data from a variety of structured and unstructured sources (APIs, RDBMS, flat files, streaming).

Develop and maintain robust data pipelines for batch and streaming data using Delta Lake and Spark Structured Streaming.

Implement data quality checks, validations, and logging mechanisms.

Optimize pipeline performance, cost, and reliability.

Collaborate with data analysts, BI, and business teams to deliver fit for purpose datasets.

Support data modeling efforts (star, snowflake schemas) de norm tables approach and assist with data warehousing initiatives.

Work with orchestration tools Databricks Workflows to schedule and monitor pipelines.

Follow best practices for version control, CI/CD, and collaborative development

Skills

Hands-on experience in ETL/Data Engineering roles.

Strong expertise in Databricks (PySpark, SQL, Delta Lake), Databricks Data Engineer Certification preferred

Experience with Spark optimization, partitioning, caching, and handling large-scale datasets.

Proficiency in SQL and scripting in Python or Scala.

Solid understanding of data lakehouse/medallion architectures and modern data platforms.

Experience working with cloud storage systems like AWS S3

Familiarity with DevOps practices Git, CI/CD, Terraform, etc.

Strong debugging, troubleshooting, and performance-tuning skills.

Primary Location

: IN-TN-Chennai

Schedule

: Full Time

Employee Status

: Individual Contributor

Job Type

: Experienced

Travel

: No

Job Posting

: 20/06/2025, 4:09:07 PM

Read Full Description
Confirmed 18 hours ago. Posted 2 days ago.

Discover Similar Jobs

Suggested Articles