Associate Manager AWS Site Reliability Engineer

PepsiCo

Overview

We are seeking a highly skilled and motivated Associate Manager AWS Site Reliability Engineer (SRE) to join our team. As an Associate Manager AWS SRE, you will play a critical role in designing, managing, and optimizing our cloud infrastructure to ensure high availability, reliability, and scalability of our services. You will collaborate with cross-functional teams to implement best practices, automate processes, and drive continuous improvements in our cloud environment

Responsibilities

  • Design and Implement Cloud Infrastructure: Architect, deploy, and maintain AWS infrastructure using Infrastructure-as-Code (IaC) tools such as Terraform or CloudFormation.
  • Monitor and Optimize Performance: Develop and implement monitoring, alerting, and logging solutions to ensure the performance and reliability of our systems.
  • Ensure High Availability: Design and implement strategies for achieving high availability and disaster recovery, including backup and failover mechanisms.
  • Automate Processes: Automate repetitive tasks and processes to improve efficiency and reduce human error using tools such as AWS Lambda, Jenkins, and Ansible.
  • Incident Response: Lead and participate in incident response activities, troubleshoot issues, and perform root cause analysis to prevent future occurrences.
  • Security and Compliance: Implement and maintain security best practices and ensure compliance with industry standards and regulations.
  • Collaborate with Development Teams: Work closely with software development teams to ensure smooth deployment and operation of applications in the cloud environment.
  • Capacity Planning: Perform capacity planning and scalability assessments to ensure our infrastructure can handle growth and increased demand.
  • Continuous Improvement: Drive continuous improvement initiatives by identifying and implementing new tools, technologies, and processes.

Qualifications

  • Experience: 10+ years of experience and Minimum of 5 years of experience in a Site Reliability Engineer (SRE) or DevOps role, with a focus on AWS cloud infrastructure.
  • Technical Skills: Proficiency in AWS services such as EC2, S3, RDS, VPC, Lambda, CloudFormation, and CloudWatch.
  • Automation Tools: Experience with Infrastructure-as-Code (IaC) tools such as Terraform or CloudFormation, and configuration management tools like Ansible or Chef.
  • Scripting: Strong scripting skills in languages such as Python, Bash, or PowerShell.
  • Monitoring and Logging: Experience with monitoring and logging tools such as Prometheus, Grafana, ELK Stack, or CloudWatch.
  • Problem-Solving: Excellent troubleshooting and problem-solving skills, with a proactive and analytical approach.
  • Communication: Strong communication and collaboration skills, with the ability to work effectively in a team-oriented environment.
  • Certifications: AWS certifications such as AWS Certified Solutions Architect, AWS Certified DevOps Engineer, or AWS Certified SysOps Administrator are highly desirable.
  • Education: Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent work experience.
Read Full Description
Confirmed 21 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles