Site Reliability Engineer

Mindvalley

About Mindvalley:

Mindvalley is the leading and most promising ed-tech company to date. We dominate the US market for Personal Growth Education. We are empowering athletes within every major US sports team and promoting successful learning strategies in major companies.

We're currently on a mission to build the most advanced and complete learning experience to enable personal growth and development for all our amazing customers. We innovate tools that induce enlightenment within every aspect of human life. We are seeking the best engineers to build the best and most advanced education platform our species has seen. The goal to mark our success is: to power up to 100 countries, powering every Fortune 500 company, and progressing humanity towards a better future.

About the Role:

Join us in the mission to build and maintain a resilient, high-performance infrastructure!

We're on the lookout for a dynamic and seasoned Site Reliability Engineer (SRE) to take charge as our SRE Engineering Manager. In this pivotal role, you'll lead an exceptional team of SREs, ensuring the stability, scalability, and efficiency of our cloud infrastructure and applications.

Responsibilities:

Cloud Infrastructure Development:

  • Develop, and oversee our cloud infrastructure across leading platforms such as AWS, GCP, or Azure.
  • Implement infrastructure as code (IaC) methodologies for streamlined provisioning and configuration management.
  • Stay abreast of cloud advancements and best practices, driving optimization initiatives within our cloud environment.
  • Collaborate closely with architects and cloud engineers to craft secure, cost-effective solutions that meet our evolving needs.

Site Reliability Champion:

  • Advocate for the principles of Site Reliability Engineering (SRE) within the team and throughout the organization.
  • Spearhead the development and deployment of automated monitoring, alerting, and incident response systems.
  • Cultivate a culture of proactive troubleshooting and continuous enhancement of infrastructure reliability.
  • Utilize metrics analysis to pinpoint bottlenecks and fine-tune performance and scalability.

CI/CD and DevOps Champion:

  • Champion CI/CD and DevOps best practices within the team.
  • Spearhead the development and deployment of automated pipelines for infrastructure deployments.
  • Integrate monitoring and alerting systems into the CI/CD pipeline for proactive issue identification.
  • Promote collaboration between SRE, development, and operations teams.

Skills:

  • Proficient in container orchestration systems, specifically Kubernetes.
  • Skilled in Prometheus Metrics & Observability ecosystems.
  • Strong understanding of Linux and network fundamentals.
  • Experience with automation tools (Terraform, Ansible, Chef, Puppet).
  • Knowledge of cloud services (AWS, GCP, Azure) and multi-cloud environments.
  • Familiarity with the full Software Development Life Cycle, including both Waterfall and Agile methodologies.
  • Excellent teamwork and communication skills, with a knack for detail-oriented problem-solving.
  • Ability to work under pressure, managing critical systems with a focus on timely delivery.
  • A proactive mindset, always looking for ways to improve system reliability and efficiency.
  • Curiosity and a continuous learning attitude, embracing new technologies and methodologies to drive innovation.

Experience:

  • Demonstrated experience (3+ years) in system design, maintenance, and troubleshooting, with a solid background in Site Reliability Engineering, DevOps, Cloud Engineering, or similar roles.
  • Proven track record in automating operations, including deployment, system configurations, and operational tasks, to minimize manual work and enhance efficiency.
  • Expertise in container orchestration systems, especially Kubernetes, to ensure scalable and reliable application deployment.
  • Proficient in implementing and managing monitoring tools like Prometheus for proactive issue detection and resolution.
  • Strong foundation in Linux and network fundamentals, ensuring secure and optimized system operations.
  • Experience with infrastructure as code tools (Terraform, Ansible, Chef, Puppet) for efficient system provisioning and management.
  • Familiarity with cloud services (AWS, GCP, Azure) and the ability to navigate and optimize multi-cloud environments.
  • Knowledge of the full Software Development Life Cycle, with experience in both Waterfall and Agile methodologies, to support continuous integration and delivery.
  • Ability to lead incident response efforts, conduct thorough post-mortem analyses, and implement preventative measures to maintain high system availability and performance.
  • Capacity planning and performance tuning expertise to manage growth effectively and maintain optimal service levels.
  • Excellent communication skills, with the ability to work closely with cross-functional teams, including direct collaboration with C-level executives and tech leadership.
  • A proactive, solution-oriented mindset, with a focus on continuous improvement and innovation to drive system reliability and efficiency.
  • Curiosity and a commitment to continuous learning, with a willingness to explore new technologies and methodologies to enhance operational excellence.

Mindvalley is an equal opportunity employer and does not discriminate on the basis of race, colour, religion, gender identity or expression, national origin, age, disability, marital status, sexual orientation, or any other legally protected status. We are committed to creating a diverse and inclusive workplace and encourage applications from all qualified individuals.

Read Full Description
Confirmed 6 hours ago. Posted 13 days ago.

Discover Similar Jobs

Suggested Articles