Site Reliability Engineer Intern, Engineering Infra (Aug - Dec 2025)

Shopee

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

About the Team:

The mission of the Shopee Tech Ops MRE (Machine Reliability Engineering) team is to ensure efficient and sustainable operation of the Shopee network and hardware level 24x7, building and maintaining massive hardware clusters for SRE and capacity, in terms of capacity, cost and hardware performance. The team provides sustainable hardware resources and stable network support services. MRE needs to communicate with the data center team to design and optimize network architecture; provide reasonable hardware configuration through hardware testing and selection according to business requirements; customize stable and efficient OS; optimize traditional operation through engineering and service means; and build a complete hardware monitoring system to improve the efficiency of fault handling.

Job Description:

  • Manage automated installation of Linux operating systems
  • Troubleshoot server hardware and OS issues
  • Oversee server assets management throughout lifecycle
  • Develop automation tools for operations.
  • Collaborate with cross-functional teams to ensure efficient system performance and reliability

Requirements:

  • Expertise in Linux automated installation processes and methodologies.
  • In-depth knowledge of the Linux OS, with proficiency in common commands and tools.
  • Strong analytical skills to diagnose and resolve OS issues, including storage and network.
  • Proficiency in Shell scripting and at least one programming language (e.g., Python, Go).
  • Basic understanding of bare metal servers and data center infrastructure.
  • Foundational knowledge of containers and orchestration technologies.
Read Full Description
Confirmed 21 hours ago. Posted a day ago.

Discover Similar Jobs

Suggested Articles