Site Reliability Engineer (SRE) - USDS

TikTok

Education
Benefits
Qualifications
Special Commitments
Skills

Responsibilities

The security team is missioned to run and operate security infrastructures, platforms and technologies, as well as to support cross-functional teams to protect our users, products and infrastructures. In this team you'll have a unique opportunity to have first-hand exposure to the strategy of the company in key security initiatives, especially in deploying and maintaining scalable and secure-by-design systems and solutions. Our challenges are not your regular day-to-day technical problems; you'll be part of a team that's developing new solutions to new challenges of a kind not previously addressed by big tech. It's working fast, at scale, and we're making a difference. In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time. Responsibilities - Work with infrastructure, product and platform engineering teams on operating and deploying software platforms, capacity planning and launch reviews throughout the whole lifecycle of services. - Maintain sustainable reliability and scalability of software systems by improving automation to measure and monitor availability, latency and overall system health. - Consistently evolve systems by pushing for changes that improve system reliability and release velocity. - Practice sustainable incident response and postmortem.

Qualifications

Minimum Qualifications - BS degree in Computer Science, Computer Engineering, Electrical Engineering or relevant majors with 5+ years of working experience. - Experience in programming, debugging, and optimization skills in general purpose programming languages but not limited to: Go, Python, C/C++ or Rust. - Experience with Unix Linux systems from kernel to shell and beyond. - Experience in analyzing and debugging production issues at scale. - Experience and understanding of infrastructure-as-code concepts, approaches, methods, and tooling. Preferred Qualifications - Hands on experience with large cloud providers such as AWS, Azure, GCP. - Familiarity with infrastructure and provisioning tools like Kubernetes, Terraform, Ansible, and SaltStack. - Secure infrastructure in a distributed system with automation or practice chaos engineering. - Experience with two or more of the following areas: web application development, distributed and parallel systems, developing large software systems, mobile application development or security software development. - Use of AIML technologies to build SRE assistants to automate tasks; diagnose and remediate events; knowledge of Model Context Protocol (MCP)

Read Full Description
Confirmed 10 hours ago. Posted 3 days ago.

Discover Similar Jobs

Suggested Articles