Senior DevOps Engineer

NVIDIA

We are now looking for a Senior DevOps Engineer . You will work on open-source technologies and enterprise adoptions such as:

  • Accelerate Apache Spark with GPU (Spark-RAPIDS) to speedup data processing and machine learning dramatically
  • Medical deep learning framework (project MONAI) that revolutionizes healthcare AI solutions worldwide
  • Federated learning technology (NVFlare) that builds generalizable AI models from diverse data sources while ensuring data security and privacy

What you'll be doing:

  • Serve as a technical leader in defining, designing, developing, and maintaining the DevOps tools, frameworks & platforms
  • Implement, advocate, and carry out CI/CD conventions and write tools to automate various steps involved in this process
  • Develop and maintain Build, Deployment, and Continuous Integration infrastructure
  • Enable the development team by providing automated build and test solutions using Docker, Kubernetes/YARN, and on-prem/CSPs
  • Work with open source communities, including RAPIDS, Spark, MONAI, and NVFlare, on CI/CD
  • Work closely with Development and QA teams to help ensure end-to-end quality
  • Full stack development opportunities depending on the candidate's capabilities

What we need to see:

  • BS or MS in Computer Science, Computer Engineering, or closely related fields
  • 10+ years of working experience in software development
  • 2+ years experience in CI/CD system, Strong programming and debugging skills in Python/Java/C++ with extensive bash scripting experience
  • Strong hands-on skills
  • Excellent knowledge of Gitlab/Github or other source version control systems
  • Configuring, maintaining, and building upon deployments of industry-standard tools (e.g. Jenkins, Kubernetes, Docker, etc)
  • Strong experience in build tools like maven, setup tools, cmake, unit testing, and code-coverage tools
  • Strong skills in software release process (maven repository, PyPI, Conda)
  • Familiar with various Linux systems like Ubuntu, CentOS, Rocky and with cloud services like AWS, Azure, GCP
  • Good knowledge of open-source big-data technologies (Spark, Hadoop) and/or ML/DL frameworks (TensorFlow, PyTorch)

Ways to stand out from the crowd:

  • Good open-source project management skills
  • Kubernetes, YARN, Spark, or Ray experience
  • Experience with Configuration Management such as Ansible, and Terraform
  • Knowledge of monitoring systems (Prometheus, Grafana)
  • Experience with CUDA would be a huge plus

We are an AA/EEO/Disabled employer and with highly competitive salaries and a comprehensive benefits package, NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most brilliant and talented people on the planet working for us. Are you creative and autonomous? Do you love a challenge? If so, we want to hear from you.

Read Full Description
Confirmed 9 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles