Dataproc Lead, Spark, OSS Technologies, Google Cloud

Google

Minimum qualifications:

  • Bachelor’s degree or equivalent practical experience.
  • 5 years of experience with software development in one or more programming languages, and with data structures/algorithms.
  • Experience in software development and engineering, incorporating design methodologies, leveraging open source technologies, and working with distributed computing systems, including Apache Spark, Apache Hadoop, and Apache Hive.
  • Experience in Open Source technologies, Big Data, Data Analytics, Artificial Intelligence, Machine Learning, and Database Internals.

Preferred qualifications:

  • Experience with database optimizations such as query and executor optimizations.
  • Experience with data lakes like Apache Iceberg, Apache Hudi, Delta Lake, etc.
  • Experience with Open Telemetry, JMX and other monitoring solutions.
  • Experience with OSS projects like Spark, Hive, Trino, Ray, Flink etc.
  • Experience working with data science tools such as Jupyter notebooks.
  • Experience developing Cloud or SaaS products.

About the Job

Google Cloud's software engineers develop the next-generation technologies that change how billions of users connect, explore, and interact with information and one another. We're looking for engineers who bring fresh ideas from all areas, including information retrieval, distributed computing, large-scale system design, networking and data storage, security, artificial intelligence, natural language processing, UI design and mobile; the list goes on and is growing every day. As a software engineer, you will work on a specific project critical to Google Cloud's needs with opportunities to switch teams and projects as you and our fast-paced business grow and evolve. You will anticipate our customer needs and be empowered to act like an owner, take action and innovate. We need our engineers to be versatile, display leadership qualities and be enthusiastic to take on new problems across the full-stack as we continue to push technology forward.

Cloud Dataproc enables open source data analytics users (Apache Hadoop, Spark, Trino, Flink, etc.) to lift and modernize their workloads into the cloud. Dataproc is a fast, easy-to-use, fully managed cloud service for running Apache Spark, Apache Hadoop and dozens of other OSS software in a simpler, performant and cost-efficient way. Dataproc also easily integrates with other Google Cloud Platform (GCP) services like BigQuery, Dataplex (governance, lineage), Catalog Stores to give a powerful and complete platform for data processing, analytics, and machine learning.

Google Cloud accelerates every organization’s ability to digitally transform its business and industry. We deliver enterprise-grade solutions that leverage Google’s cutting-edge technology, and tools that help developers build more sustainably. Customers in more than 200 countries and territories turn to Google Cloud as their trusted partner to enable growth and solve their most critical business problems.

Responsibilities

  • Build high-impact customer-facing features which make Cloud Dataproc the best place to run Spark, Ray, Trino, Flink and newer technologies in the cloud.
  • Define the roadmap for Open Source technologies like Spark, Ray, Trino, Flink, etc.
  • Define and implement the next generation Data Lakes and Lake Houses focusing on technologies like Iceberg, Hudi and Delta.
  • Optimize the open source technologies for performance and efficiency.
  • Design and build software stack to take advantage of Google technologies for faster cluster setup, efficient cluster operations, comprehensive monitoring and observability.
Read Full Description
Confirmed 20 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles