NVIDIA's NGC (NVIDIA Gpu Cloud) team is looking for highly motivated Linux System Administrator/DevOps engineers to develop, implement and operate a global, dynamic, state-of-the-art Service Reliability Operations Center, to provide extraordinary levels of support for our Compute Infrastructure and services. As a key member of the Compute Infrastructure Support (CIS) team, you will partner with other key members of our organization including Site Reliability Engineering, Security Operations Center, DevOps teams, and other datacenter operations partners to help make our services capable of providing near 100% availability. On the rare occasion that an incident occurs, you will be our front line to decrease the frequency and duration of any issue. Working in partnership with the development community the team will develop monitors, alarms and alerts to help make the service more reliable and improve our customer experience. Additionally, you will be very involved in supporting the technologies that we will use in the CIS team to help monitor, run and measure the effectiveness of the Compute environment.
What you will be doing:
What we need to see:
NVIDIA offers highly competitive salaries and a comprehensive benefits package. We have some of the most forward-thinking and talented people in the world working for us and, due to unprecedented growth, our world-class engineering teams are growing fast. If you're a creative and autonomous engineer with real passion for technology, we want to hear from you.
Read Full Description