Site Reliability Engineer

Palantir

Education
Benefits
Qualifications

THE ROLE

Palantir software is deployed at the world’s most critical institutions to help them solve their greatest challenges. Users at customer sites from Washington DC, to London, to Tokyo rely on Palantir’s high availability to pursue their missions. Site Reliability Engineers (SREs) make sure our expanding number of customer deployments run smoothly 24 hours a day.

SREs monitor and maintain Palantir systems to pre-empt problems before they ever threaten our customers’ workflows. SREs combine engineering experience and an innate drive to improve existing systems and processes with the creativity to develop novel solutions to evolving challenges. Our team strives to automate processes wherever possible, using whatever tools are best for the job. Our responsibilities range from architecting systems for new implementations of Palantir, administering co-located servers (including hardware troubleshooting) to maintaining database platforms.

We work with a variety of teams to understand threats to our software and improve our products over time. We work side by side with Palantir’s implementation teams and our customers' IT departments to understand their business’s unique problems and to develop innovative solutions. We document our successes and communicate them back to Palantir’s product teams to advance the way our hardware, software, and network solutions are deployed to minimize failure rates and increase overall system reliability.

REQUIREMENTS

  • 5+ years of experience with Linux system administration (RHEL or CentOS preferred)
  • Experience with monitoring systems using tools like Nagios and writing health checks
  • Good scripting ability in Bash, and preferably also Python, Ruby, Perl or JavaScript
  • Interest in learning and managing newer technologies like Spark, Hadoop, Cassandra, ElasticSearch, Node.js, and RabbitMQ
  • Ability to work independently with minimal supervision
  • Ability to participate in a 24/7 on-call rotation
  • Unwavering commitment to operational security and best practices

PREFERRED

  • BS/MS in Computer Science
  • Experience with virtualization using AWS, VMWare ESX, KVM, Xen, or Docker
  • Experience with system management tools like Puppet or Chef
  • Ability to travel to customer sites up to 25% of time
  • Knowledge of server hardware and/or experience working with Amazon Web Services (AWS)

Read Full Description
Confirmed 9 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles