The Engineering and Technology team is at the core of the Shopee platform development. The team is made up of a group of passionate engineers from all over the world, striving to build the best systems with the most suitable technologies. Our engineers do not merely solve problems at hand; We build foundations for a long-lasting future. We don't limit ourselves on what we can or can't do; we take matters into our own hands even if it means drilling down to the bottom layer of the computing platform. Shopee's hyper-growing business scale has transformed most "innocent" problems into huge technical challenges, and there is no better place to experience it first-hand if you love technologies as much as we do.
About the Team:
We are looking for a proactive and detail-oriented Site Reliability Engineering (SRE) Intern to join our Big Data Infrastructure team. This internship is ideal for students who are passionate about Linux systems, scripting, and large-scale data platforms. You will gain hands-on experience in operating and improving the reliability of data infrastructure services.
Job Description:
- Support daily operations of big data platforms, including monitoring, troubleshooting, and routine maintenance.
- Write and optimize Shell scripts to automate operational workflows and system tasks.
- Assist in system health checks, log analysis, and reliability improvements.
- Participate in building tools to enhance the observability and automation of data services.
- Document standard operating procedures and support knowledge sharing across the team.
Requirements:
Basic Qualifications
- Currently pursuing a Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field.
- Strong understanding of Linux operating systems and command-line tools.
- Proficiency in Shell scripting (bash, sh, etc.).
- Clear interest in large-scale systems and reliability engineering.
- Willingness to learn, take initiative, and work collaboratively.
Bonus Qualifications (Nice to Have)
- Familiarity with Python for automation or internal tooling.
- Experience with web platform development (e.g., using Flask, FastAPI, or similar frameworks).
- Exposure to big data and storage engines such as: HDFS, Apache Ozone, Alluxio
- Understanding of monitoring or alerting tools (e.g., Prometheus, Grafana, ELK).
- Knowledge of Git and basic CI/CD workflows.
What You'll Gain
- Real-world experience in operating and improving a production-grade big data platform.
- Exposure to SRE practices including automation, fault-tolerance, and observability.
- Mentorship from experienced infrastructure engineers.
- Potential opportunity for full-time conversion based on performance and graduation timeline.
Read Full Description