Platform and SRE Manager

DigiCert

Who we are

We're a leading, global security authority that's disrupting our own category. Our encryption is trusted by the major ecommerce brands, the world's largest companies, the major cloud providers, entire country financial systems, entire internets of things and even down to the little things like surgically embedded pacemakers. We help companies put trust - an abstract idea - to work. That's digital trust for the real world.

Job summary

We are seeking a highly technical and hands-on Manager of Platform Ops and Site Reliability Engineering to lead the teams in building scalable, reliable, and automated systems. This role will focus on enabling automation-first principles, strengthening our infrastructure with fault-tolerant designs, and partnering across engineering to drive reliability and operational maturity. As a player-coach, you will mentor engineers while actively contributing to technical initiatives that reduce toil, enhance observability, and ensure performance at scale.

As a Platform and SRE Manager, you’ll shape the future of our systems and the people who support them. You’ll lead with technical depth and inspire a culture of automation, resilience, and excellence. This is a high-impact role, ideal for a leader who thrives at the intersection of engineering, operations, and innovation.

What you will do

  • Lead and grow a high-performing Platform and SRE teams focused on technical excellence and automation-driven operations.
  • Drive adoption of Infrastructure as Code (IaC), automated remediation, and scalable deployment strategies.
  • Partner with Engineering, Product, and Security teams to establish and enforce SLAs/SLOs and error budget policies.
  • Architect and oversee the implementation of monitoring, alerting, distributed tracing, and logging across all systems.
  • Champion reliability reviews, chaos engineering, capacity planning, and performance benchmarking.
  • Own the technical roadmap for SRE practices, tools, and platform automation.
  • Oversee and contribute to the development of CI/CD pipelines that support fast, safe, and repeatable delivery.
  • Conduct deep dives into production incidents, guiding root cause analysis and driving long-term solutions.
  • Implement systems for continuous feedback and improvement based on observability data and incident learnings.
  • Build a culture of reliability, technical rigor, and operational accountability across teams.

What you will have

  • 8+ years in software engineering, DevOps, or SRE, with 3+ years in technical leadership or management roles.
  • Proven experience with Kubernetes, Terraform, and modern cloud infrastructure (AWS, GCP, Azure).
  • Strong programming/scripting skills in Python, Go, Bash, or similar.
  • Deep understanding of observability tooling: Prometheus, Grafana, Splunk, New Relic, Datadog, or OpenTelemetry.
  • Demonstrated ability to lead teams building highly automated, scalable, and resilient systems.
  • Track record of driving technical transformation and reducing operational toil through automation.
  • Experience leading incident response and postmortem processes.

Benefits

DigiCert offers a competitive benefits package for all of our full-time employees.

DigiCert is an Equal Opportunity employer and is committed to diversity in its workforce. In compliance with applicable federal and state laws, DigiCert prohibits discrimination on the basis of race or ethnicity, religion, color, national origin, sex, age, sexual orientation, gender identity/expression, veteran’s status, status as a qualified person with a disability, or genetic information. Individuals from historically underrepresented groups, such as minorities, women, qualified person with disabilities, and protected veterans are strongly encouraged to apply.

#LI-RR1

__PRESENT

Read Full Description
Confirmed 2 hours ago. Posted 7 days ago.

Discover Similar Jobs

Suggested Articles