Job Description
Do you want to be part of an inclusive team that works to develop innovative therapies for patients? Every day, we are driven to develop and deliver innovative and effective new medicines to patients and physicians. If you want to be part of this exciting work, you belong to Astellas!
Astellas Pharma Inc. is a pharmaceutical company conducting business in more than 70 countries around the world. We are committed to turning innovative science into medical solutions that bring value and hope to patients and their families. Keeping our focus on addressing unmet medical needs and conducting our business with ethics and integrity enables us to improve the health of people throughout the world. For more information on Astellas, please visit our website at www.astellas.com .
This position is based in Bengaluru and might require some on-site work.
Astellas’ Global Capability Centres – Overview
Astellas’ Global Capability Centres (GCCs) are strategically located sites that give Astellas the ability to access talent across various functions in the value chain and to co-locate core capabilities that are currently dispersed. Our three GCCs are located in India, Poland and Mexico.
The GCCs will enhance our operational efficiency, resilience and innovation potential, enabling a timely response to changing business demands.
Our GCCs are an integral part of Astellas, guided by our shared values and behaviors, and are critical enablers of the company’s strategic priorities, sustainable growth, and commitment to turn innovative science into VALUE for patients.
Purpose and Scope:
The Databricks Developer is responsible for building and enhancing the data processing pipelines and distributed compute workloads that run on the Databricks Platform. This role focuses on writing scalable PySpark and SQL code, designing efficient Delta Lake data flows, and implementing reliable job orchestration patterns that support high volume, production grade data operations. You will work directly within Databricks notebooks and workflows to build ingestion and transformation logic, optimize cluster usage, and ensure pipelines meet performance, reliability, and cost expectations.
This position works closely with Data Engineers, Platform Engineering, and Data Science teams to translate technical requirements into well-structured data pipelines and automated jobs. The role involves debugging distributed compute issues, tuning Spark performance, enforcing coding and data quality standards, and integrating pipelines with CI/CD and monitoring tools. Your work ensures that downstream analytics, ML models, and business applications have access to accurate, timely, and well-organized data across the Astellas Data Platform.
Responsibilities and Accountabilities:
- Develop & Maintain Scalable Data Pipelines : Design, build, and optimize ETL/ELT pipelines using PySpark, Spark SQL, Auto Loader, and Delta Live Tables to support ingestion and transformation.
- Implement Robust Lakehouse Architecture: Design and enhance Medallion (Bronze/Silver/Gold) layers for data models, applying Delta Lake features such as schema evolution, CDF, Optimize, and Z Ordering to deliver performant, reliable, and cost efficient data layers.
- Integrate Data Across Cloud Platforms: Ingest and harmonize structured, semi structured, and unstructured data from multiple cloud environments including Azure, AWS, and enterprise object storage.
- Develop Reusable Engineering Frameworks: Create and maintain reusable Python, PySpark, and YAML based libraries and patterns to standardize ingestion, transformation, automation, and engineering workflows across teams.
- Enforce Data Quality & Governance: Implement and operationalize automated data validation frameworks (DLT expectations, data contracts) while applying Unity Catalog governance covering permissions, lineage, external locations, and PII/PHI controls.
- CI/CD & Deployment Automation: Utilize Azure DevOps and Databricks Asset Bundles (DABs) to establish automated build, test, and deployment workflows; ensure source control discipline and promote engineering best practices.
- Optimize Performance & Cost Efficiency: Enhance Spark workloads by implementing partitioning, caching, and join optimization strategies; leverage Photon, serverless SQL, and cluster right sizing to improve runtime performance.
- Collaborate with Data & Platform Teams: Partner closely with Business, Analysts, SMEs, and Platform Engineering teams to translate requirements into scalable data solutions.
- Develop Lightweight Analytical Applications: Build small-scale applications using Streamlit, Shiny,ReactJs or Gradio to support internal stakeholders with interactive data products and insights.
- Technical Guidance and Support: Participate and contribute to design reviews, mentor junior developers and ensure high-quality engineering standards and practices across data engineering efforts.
Required Qualifications:
- Bachelor’s degree in computer science, Engineering, or related discipline, or equivalent experience.
- 7+ years of Data Engineering experience, including 3+ years working on Databricks.
- Proven experience designing enterprise-scale data architectures and distributed systems
- Deep expertise in Delta Lake internals (file pruning, compaction, metadata management, CDF tuning).
- Experience leading complex migrations (legacy ETL, cloud migrations, warehouse consolidation).
- Experience developing reusable engineering frameworks, libraries, and standards.
- Strong proficiency in Python, SQL, and PySpark for building scalable data pipelines.
- Experience with cloud platforms such as Azure, AWS, or GCP, including working with object storage.
- Hands-on experience with warehouse/Lakehouse technologies, including Synapse, Snowflake, or Redshift.
- Knowledge of traditional ETL tools, such as Informatica, Talend, or equivalent.
- Proficiency with Git-based version control and DevOps tooling (Azure DevOps, GitHub, Bitbucket).
- Experience with Databricks Workflows and orchestration tools for automated data processing.
Preferred Qualifications:
- Experience building batch and streaming hybrid architectures (CDC, Auto Loader sequencing, DLT pipelines)
- Experience automating DevOps using DABs across multi-workspace deployments
- Experience with Delta Live Tables (DLT), Auto Loader, and streaming or hybrid (batch + streaming) architectures—including CDC, event sequencing, and incremental processing.
- Hands-on expertise with Unity Catalog governance, including lineage, ABAC/RBAC access controls, external locations, and secure data sharing patterns.
- Experience working in regulated industries such as pharmaceuticals, healthcare, or life sciences.
- Proficiency with MLflow and MLOps lifecycle management, including model tracking, registry operations, and production deployment workflows.
- Demonstrated ability to build reusable shared libraries, engineering frameworks, and standardized patterns for enterprise-scale data platforms.
- Databricks Certified Data Engineer Professional certification (strongly preferred).
- Experience with serverless compute models, Photon runtime, and Delta Sharing for cross-domain or cross-organization data exchange.
- Familiarity with data mesh or domain-oriented data product architectures supporting federated ownership and self-service data capabilities.
- Experience implementing or configuring data observability tooling (e.g., Monte Carlo, or equivalent) to monitor quality, lineage, and pipeline health.
- Hands-on experience automating DevOps workflows using Databricks Asset Bundles (DABs) across multi-workspace or multi-environment deployments
Working Environment:
At Astellas we recognize the importance of work/life balance, and we are proud to offer a hybrid working solution allowing time to connect with colleagues at the office with the flexibility to also work from home. We believe this will optimize the most productive work environment for all employees to succeed and deliver. Hybrid work from certain locations may be permitted in accordance with Astellas’ Responsible Flexibility
"Beware of recruitment scams impersonating Astellas recruiters or representatives. Authentic communication will only originate from a verified company email address (Astellas email address) or an official Astellas LinkedIn profile. If you encounter a fake profile or anything suspicious, report it promptly to LinkedIn's support team through LinkedIn Help"
Category PlatformX (SUB00000710)
Astellas is committed to equality of opportunity in all aspects of employment.
EOE including Disability/Protected Veterans
Apply Now
Read Full Description