Software Engineer Graduate (Applied Machine Learning - Training) - 2026 Start (PhD)

ByteDance

Responsibilities

About the team:

The mission of our AML team is to push next-generation machine learning algorithms and platforms for the recommendation system, ads ranking and search ranking in our company. We also drive substantial impact on core businesses of the company. We are looking for a Software Engineer New Graduate to join our team to support and advance that mission.

Responsibilities:

  • Responsible for the design and implementation of a global-scale machine learning system for feeds, ads and search ranking models.
  • Responsible for improving usability, flexibility, scalability, and stability of the machine learning infrastructure.
  • Responsible for improving the performance of machine learning infrastructure over different hardware and network topologies.
  • Responsible for improving the workflow of model training and serving, data pipelines, and resource management for multi-tenancy machine learning systems.

Qualifications

Minimum Qualifications:

  • Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline
  • Have distributed system or other infrastructure system experience
  • CUDA, Compiler, AND/OR C++ experience
  • ML/Deep learning Frameworks: GPUs/TPUs along with Tensorflow/Pytorch

Preferred Qualifications:

  • Computer architecture (CPUs, Memory Storage, microarchitecture, etc.), or hardware infrastructure experience
  • Experience in big data frameworks (e.g., K8s/Spark/Hadoop/Flink), experience in resource management and task scheduling for large scale distributed systems, experience in building solutions with AWS, GCP, Azures, OCI, AliCloud or other cloud services. [Scheduling]
  • Strong background in one of the following fields: Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (e.g., GPU/TPU/RDMA) or ML for Systems.
  • Experience in developing and deploying large-scale systems(e.g. Monitoring, Analyzing, Troubleshooting, and Notification systems), strong understanding of code optimizing, routine task automation and failure self-healing, familiar with IaC technologies like Terraform/Ansible. [EffeciencyTool]
Read Full Description
Confirmed 23 hours ago. Posted 2 days ago.

Discover Similar Jobs

Suggested Articles