Site Reliability Engineer - Recommendation Infrastructure

TikTok

Responsibilities

Our Recommendation Infrastructure Team is responsible for building up and optimizing the architecture for our recommendation system to provide the most stable and best experience for our TikTok users. SREs in our team keep the systems up and running with the highest level of availability, and create highly automated systems and pipelines. What You'll Do • Engage in and improve the whole lifecycle of Recommendation systems — from system design consulting through to launch reviews, deployment, operation and refinement • Deliver tools/software to improve the reliability and scalability of services, automate operations and improve R&D efficiency • Build availability of large-scale services deployed across global data centers • Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters • Measure and monitor availability, latency and overall service health • Practice sustainable incident response and postmortems.

Qualifications

Minimum Qualifications: • Bachelor's degree or above majoring in Computer Science or related fields • Familiar with system operation skills in Linux and network • Experience programming in at least one of the following languages: Python, Perl, Go, or C/C++ • Familiar with popular CI/CD procedures and environments • Effective communication skills and a sense of ownership and drive Preferred Qualifications: • Experience in SRE of large-scale systems deployment with high reliability and scalability. • Experience in designing, analyzing and troubleshooting large-scale distributed systems

Read Full Description
Confirmed 7 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles