Build analytics capabilities that utilize the data pipeline to provide actionable insights
Work with cross functional technology stakeholders to build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and big data technologies.
Re-design infrastructure for greater scalability.
Assemble large, complex data sets that meet functional/non-functional business requirements.
Identify, design, and implement internal process improvements including establishing standards for development processes and technical requirements; automating manual processes; optimizing data delivery, etc.
Develop and ensure compliance for data management processes, policies and standards.
Implement and enforce controls to maintain data availability and quality.
Recognize and adopt best practices in data integrity, test design, analysis, validation, and documentation.
Tune application and query performance using profiling tools and SQL.
Review data at aggregate levels on a regular basis using analytical reporting tools to support the identification of risks and data patterns or trends.
Work closely with cross-functional teams to assist with data-related technical issues and support their data infrastructure needs.
Keep up-to-date with the latest technology trends and methods by staying abreast of state-of-the-art literature in the fields of advanced analytics, statistical modeling, and data management.
Requirements:
Knowledge of RDBMS, well versed with SQL, will be required for source data analysis/ extraction
Python with GCP APIs like BigQuery, Google Cloud Storage, and Google Dataflow
Cloud Dataflow will be good to have
Google Data Loss Prevention API
Apache Nifi
Cloud Composer
Scripting on python and general bash scripts
Knowledge of any additional ETL tool(like Informatica) and Control-M if required to extract data from source