As a member on the Site Reliabilty Engineer team, you will work on large-scale system design and troubleshooting, and be fluent in systems programming and/or automation. You will have a desire to tackle the complex problems of scale which are unique to Tokopedia.
Design, write and deliver software to improve the availability, scalability, latency, and efficiency of Tokopedia's services.
Solve problems related to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
Influence and create new designs, architectures, standards and methods for large-scale distributed systems.
Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.
Conduct periodic on call duties using a follow-the-sun model.
Bachelors degree in Computer Science or related technical field, or equivalent practical experience.
Experience in one or more of: C, C++, Java, Perl, Python, Go, or scripting experience in Shell and Perl.
Experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols.
Networking: experience with network theory e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, OSI layers, and load balancing.