Team Introduction:
Data AML is ByteDance's machine learning middle platform, providing training and inference systems for recommendation, advertising, CV (computer vision), speech, and NLP (natural language processing) across businesses such as Douyin, Toutiao, and Xigua Video.
AML provides powerful machine learning computing capabilities to internal business units and conducts research on general and innovative algorithms to solve key business challenges. Additionally, through Volcano Engine, it delivers core machine learning and recommendation system capabilities to external enterprise clients.
Beyond business applications, AML is also engaged in cutting-edge research in areas such as AI for Science and scientific computing.
Research Project Introduction:
Large-scale recommendation systems are being increasingly applied to short video, text community, image and other products, and the role of modal information in recommendation systems has become more prominent. ByteDance's practice has found that modal information can serve as a generalization feature to support business scenarios such as recommendation, and the research on end-to-end ultra-large-scale multimodal recommendation systems has enormous potential. It is expected to further explore directions such as multimodal cotraining, 7B/13B large-scale parameter models, and longer sequence end-to-end based on algorithm-engineering CoDesign.
Engineering research directions include:
Representation of multimodal samples
Construction of high-performance multimodal inference engines based on the PyTorch framework
Development of high-performance multimodal training frameworks
Application of heterogeneous hardware in multimodal recommendation systems
Algorithmic research directions include:
1. Design of reasonable recommendation-advertising and multimodal cotraining architectures
2. Sparse Mixture of Experts (Sparse MOE)
3. Memory Network
4. Hybrid precision techniques
1. Got doctor degree, with priority given to candidates majoring in Computer Science, Software Engineering, or related fields;
2. Proficiency in 1 or more programming languages such as C/C++/Go/Python/Java in a Linux environment;
3. Deep understanding of distributed system principles, with experience in designing, developing, and maintaining large-scale distributed systems;
4. Possess excellent logical analysis skills, able to abstract and decompose complex business logic effectively, with a collaborative team spirit;
5. Strong sense of responsibility, with good learning ability, communication skills, and self-motivation;
6. Good habits in technical documentation, including timely writing and updating of work processes and technical docs as required.
Bonus Qualifications:
1. Familiarity with Kubernetes architecture and extensive experience in cloud-native system development;
2. Experience with at least one mainstream machine learning framework (e.g., TensorFlow, PyTorch, MXNet);
3. Familiarity with Django, Flask, or related technologies, with backend development experience;
4. Experience in one or more of the following areas:
5. Experience in large-scale cloud computing platforms or private cloud product architecture development.
Read Full Description