Lead Site Reliability Engineer

Trimble

Lead Site Reliability Engineer

Reporting to: Sr Manager, Availability Management

Office Location: Chennai, India

Flexible Working: Hybrid (Part Office/Part Home)

Cloud Site Reliability Engineer Responsibilities

  • On-board internal customers to our 24x7 Applications Support and Enterprise Status Page services
  • Be involved with creating an SRE culture globally by defining monitoring strategies and best practices at the organization.
  • Monitor application performance and have the ability to provide recommendations on increasing the observability of applications and platforms.
  • Play an important role in the Continual Service Improvement process, identifying and driving improvement
  • Be instrumental to developing standards, guides to assist the business in maximizing their use of common tools .
  • Participate in code peer reviews and enforce quality gates to ensure best practices are followed.
  • Apply automation to tasks which would benefit from this. Automating repetitive tasks and deploying monitors via code are core examples.
  • Document knowledge gained from engagements in the forms of runbooks and other information critical to incident response.
  • Exploring and applying Artificial Intelligence to enhance operational processes/procedures

Should-Haves - Skills & Experience

  • Strong skills with modern monitoring tools and demonstrable knowledge of APM, RUM and/or synthetic testing.
  • Experience working with observability tools such as Datadog, NewRelic, Splunk, CloudWatch, AzureMonitor
  • Experience with the OpenTelemetry (OTEL) Standard
  • Working knowledge of at least one programming language, such as Python, JavaScript (NodeJS, etc), Golang or others.
  • Strong experience with IaC tools, such as Terraform and Cloudformation.
  • Experience with cloud environments, especially AWS and/or Azure.
  • Good customer interaction skills and able to understand their needs and expectations.
  • Strength in conviction, able to encourage adoption to a wide audience but comfortable with mandating where necessary
  • Experience with code quality tools, such as SonarQube.
  • Knowledge on code linters tools of various programming languages.
  • Experience with CI/CD tools. Such as Bamboo, Jenkins, Azure DevOps, Github actions.
  • ITIL experience with basic understanding on incident management, problem management and change management.

Nice-to-Haves - Skills & Experience

  • Any cloud certification
  • ITIL certifications
  • Experience with ITSM tools
  • Experience using On-Call Management Tooling

No travel required

Read Full Description
Confirmed 18 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles