Site Reliability Engineer Intern, Data Infra (Aug - Dec 2025)

Shopee

The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.

About the Team:

We are looking for a proactive and detail-oriented Site Reliability Engineering (SRE) Intern to join our Big Data Infrastructure team. This internship is ideal for students who are passionate about Linux systems, scripting, and large-scale data platforms. You will gain hands-on experience in operating and improving the reliability of data infrastructure services.

Job Description:

  • Support daily operations of big data platforms, including monitoring, troubleshooting, and routine maintenance.
  • Write and optimize Shell scripts to automate operational workflows and system tasks.
  • Assist in system health checks, log analysis, and reliability improvements.
  • Participate in building tools to enhance the observability and automation of data services.
  • Document standard operating procedures and support knowledge sharing across the team.

Requirements:

Basic Qualifications

  • Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.
  • Strong understanding of Linux operating systems and command-line tools.
  • Proficiency in Shell scripting (bash, sh, etc.).
  • Clear interest in large-scale systems and reliability engineering.
  • Willingness to learn, take initiative, and work collaboratively.

Bonus Qualifications (Nice to Have)

  • Familiarity with Python for automation or internal tooling.
  • Experience with web platform development (e.g., using Flask, FastAPI, or similar frameworks).
  • Exposure to big data and storage engines such as: HDFS, Apache Ozone, Alluxio
  • Understanding of monitoring or alerting tools (e.g., Prometheus, Grafana, ELK).
  • Knowledge of Git and basic CI/CD workflows.

What You'll Gain

  • Real-world experience in operating and improving a production-grade big data platform.
  • Exposure to SRE practices including automation, fault-tolerance, and observability.
  • Mentorship from experienced infrastructure engineers.
  • Potential opportunity for full-time conversion based on performance and graduation timeline.
Read Full Description
Confirmed an hour ago. Posted 2 days ago.

Discover Similar Jobs

Suggested Articles