Company Type
Education
Industry
Workhours
ClusterOne serves large enterprise customers and is relied upon by researchers to build the next AI breakthrough for a variety of industries including life sciences and robotics. Our products solve various aspects of managing the training and deploying large-scale Machine Learning models, requiring the handling of thousands of servers, petabytes of data on various clouds, and data centers securely and efficiently. 

We need engineers with a passion for enabling and empowering researchers and developers with fast and reliable build AI related infrastructure. This role requires you to solve difficult problems together with a team of extremely knowledgeable and talented people. 

Key Qualifications 

  • Passionate about Continuous Build, Integration, Test, Deployment, Delivery and the DevOps culture 
  • Experience with the internals of Docker
  • Experience with service provider monitoring systems: Zabbix, Prometheus or Nagios
  • Deep understanding of UNIX/Linux 

Description 
In this role you will be automating distributed systems across complex, geographically dispersed environments. You will be responsible for the deployment and uptime of a multi-tenant, large scale security based cloud service. If you enjoy complex monitoring, automation and networking challenges and building tools to operate large scale environments from start to finish

On your first day, we’ll expect you to have:

Deep understanding of Linux systems
  • Experience with service provider monitoring systems: Zabbix, Prometheus or Nagios
  • Deep expertise in Monitoring distributed systems application architectures. 
  • Proficient in one or more open-source configuration management systems: Ansible, Puppet or Chef
  • Solid communication skills with team members near and far. 
  • Experience with container management and micro-services architectures such as Docker

It’s great, but not required, if you have 
  • Building, automating, and maintaining infrastructure in Amazon Web Services. 
  • Experience with Kubernetes. 
  • Experience working with Atlassian products such as Jira
  • Advanced networking experience. 
  • Experience working with a geographically distributed team. 

Education 
Technical BS/MS/PHD or relevant industry experience.

Read Full DescriptionHide Full Description
Confirmed an hour ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles