Responsible for the design and implementation of a global-scale machine learning system for feeds, ads and search ranking models.
Responsible for improving use-ability and flexibility of the machine learning infrastructure.
Responsible for improving the workflow of model training and serving, data pipelines, storage system and resource management for multi-tenancy machine learning systems.
Responsible for designing and developing key components of ML infrastructure and mentoring interns.
Qualifications
Minimum Qualifications
At least 5 years of experience in developing and deploying large-scale systems.
Proficient in C/C++/CUDA/Python, and have solid programming skills.
Familiar with deep learning frameworks (TensorFlow/Pytorch).
Experience on improving core machine learning infrastructure(TensorFlow, Pytorch, and Jax).
Preferred Qualifications:
Experience contributing to an open sourced machine learning framework (TensorFlow/PyTorch).
Experience in using/designing open-source machine learning lifecycle management systems: TFX