Senior DevOps Engineer

NVIDIA

Education
Benefits
Special Commitments

We are looking for a Senior DevOps Engineer to join our Data and Application Services team to improve its growing services infrastructure. At the core of our application services platform is our multi-tenant Kubernetes platform that is designed to run a variety of inhouse application services. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. Our team is a mix of varying levels of experience and backgrounds, from new grads to industry experts. We are looking for a motivated, hardworking and focused individual who have a real passion for operational excellence, data systems, and automation.

What you'll be doing:

  • Own the services you build working with cross functional teams
  • Comfortable with frequent code testing and deployment
  • Continuously improve infrastructure provisioning and management using automation
  • Identify areas to improve service resiliency through industry standard practices
  • Support a globally distributed, multi-cloud hybrid environment - AWS, GCP and On-prem
  • Determine root-cause for production level incidents and write corresponding high-quality RCA reports
  • Ensure the highest level of up-time and Quality of Service (QoS) to internal customers through operational excellence
  • Define service level objectives (SLOs) and service level indicators (SLIs) to represent and measure service quality
  • Participate in team's on-call rotation and be an escalation contact for service incidents

What we need to see:

  • 3+ years in operating services including web servers, load balancers, relational/non-relational databases, messaging systems and storage solutions
  • 3+ years coding/scripting in at least two high level programming languages - Python, Go, Ruby, Groovy etc.,
  • Deep understanding of linux operation system and TCP/IP fundamentals
  • Expertise with at least one major cloud service provider- AWS, GCP, Azure
  • Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)
  • Hands on experience managing production quality observability stacks
  • Excellent troubleshooting and problem solving skills
  • B.S. degree in Computer Science or related technical field
  • Detail oriented with great communication and documentation skills

Ways to stand out from the crowd:

  • Linux certification from a well known vendor - RedHat, Oracle etc.,
  • Prior experience managing large scale kubernetes deployment in production.
  • Strong skills in modern container networking and storage architecture
Read Full Description
Confirmed 18 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles