Minimum qualifications:

  • Bachelor's degree in Mechanical Engineering, Electrical Engineering, Materials Science, Reliability Engineering, Physics, Applied Mathematics, or a related technical field, or equivalent practical experience.
  • 12 years of experience in hardware reliability engineering, product development, or a related technical field.
  • 5 years of experience in a technical leadership or management role, leading engineering teams.
  • 5 years of experience with statistical analysis, data modeling, and numerical methods applied to hardware systems (e.g., Weibull analysis, Monte Carlo simulations, FMEA, FTA).
  • Experience with high-volume, high-reliability hardware in industries such as servers, consumer electronics, automotive, or aerospace.

Preferred qualifications:

  • Master's degree or PhD in a relevant engineering or scientific discipline.
  • Experience with large-scale data analysis platforms and programming languages (e.g., Python, R, SQL) for numerical modeling and data visualization and understanding of server hardware architecture, components (e.g., CPUs, memory, storage, networking), and data center environments.
  • Experience developing and implementing reliability test plans and standards (e.g., JEDEC, Telcordia).
  • Familiarity with machine learning applications for predictive maintenance and anomaly detection in hardware.
  • Ability to drive cross-functional initiatives and influence stakeholders.
  • Excellent communication skills, able to articulate complex technical concepts.

About the Job

Be part of a team that pushes boundaries, developing custom silicon solutions that power the future of Google's direct-to-consumer products. You'll contribute to the innovation behind products loved by millions worldwide. Your expertise will shape the next generation of hardware experiences, delivering unparalleled performance, efficiency, and integration.

Google Cloud is building the future of cloud computing, and at its core lies a global infrastructure that requires reliability. We are seeking an experienced and visionary Senior Manager, Hardware Reliability and Numerical Analysis to lead a critical team responsible for ensuring the longevity, and performance of our hardware systems through advanced methods.

In this role, you will lead a team of highly skilled engineers, driving the direction for hardware reliability across Google Cloud's infrastructure. You will be instrumental in developing and implementing sophisticated numerical models, statistical analyses, and simulation techniques to predict, prevent, and mitigate hardware failures at scale. You will need technical expertise in reliability engineering and numerical methods, as well as leadership skills and the ability to collaborate across functions.

Behind everything our users see online is the architecture built by the Technical Infrastructure team to keep it running. From developing and maintaining our data centers to building the next generation of Google platforms, we make Google's product portfolio possible. We're proud to be our engineers' engineers and love voiding warranties by taking things apart so we can rebuild them. We keep our networks up and running, ensuring our users have the best and fastest experience possible.

The US base salary range for this full-time position is $208,000-$293,000 + bonus + equity + benefits. Our salary ranges are determined by role, level, and location. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process.

Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits. Learn more about benefits at Google.

Responsibilities

  • Lead and mentor a team of hardware reliability engineers and numerical analysts, fostering technical excellence and continuous improvement.
  • Define and execute a roadmap for hardware reliability and numerical analysis within Google Cloud, aligning with business purposes and infrastructure growth.
  • Develop and refine advanced models (e.g., statistical, machine learning, simulation) to predict hardware life-cycles, identify failure modes, and optimize maintenance.
  • Collaborate cross-functionally with design, manufacturing, supply chain, and operations to integrate reliability best practices throughout the product life-cycle.
  • Analyze field reliability data and drive continuous innovation in testing methodologies, data analytics, and design improvements.
Read Full Description
Confirmed 2 hours ago. Posted 4 days ago.

Discover Similar Jobs

Suggested Articles