Research Scientist Intern (Doubao (Seed) - Machine Learning System) - 2025 Start (MS)

ByteDance

Responsibilities

Established in 2023, the ByteDance Seed team is dedicated to discovering new approaches to general intelligence, and pushing the boundaries of AI. Our research spans large language models, speech, vision, world models, AI infrastructure, next-generation interfaces and more.

With a long-term vision and determination in AI, the ByteDance Seed team remains committed to foundational research. We aim to become a world-class AI research team that drives real technological progress and delivers societal benefits.

With labs across China, Singapore, and the U.S., our team has already released industry-leading general-purpose large models and advanced multimodal capabilities, powering over 50 real-world applications — including Doubao, Coze, and Jimeng.

We are looking for talented individuals to join us for an internship in 2025. Internships at Bytedance aim to offer students industry exposure and hands-on experience. Turn your ambitions into reality as your inspiration brings infinite opportunities at ByteDance.

Internships at ByteDance aim to provide students with hands-on experience in developing fundamental skills and exploring potential career paths. A vibrant blend of social events and enriching development workshops will be available for you to explore. Here, you will utilize your knowledge in real-world scenarios while laying a strong foundation for personal and professional growth. This Internship Program runs for 12 weeks beginning in May/June 2025.

Candidates can apply to a maximum of two positions and will be considered for jobs in the order you apply. The application limit is applicable to Bytedance and its affiliates' jobs globally. Applications will be reviewed on a rolling basis - we encourage you to apply early.

Responsibilities

  • Research and develop our machine learning systems, including heterogeneous computing architecture, management, scheduling, and monitoring.
  • Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, ASIC).
  • Implement both general purpose training framework features and model specific optimizations (e.g. LLM, diffusions).
  • Improve efficiency and stability for extremely large scale distributed training jobs.

Qualifications

Minimum Qualifications

  • Currently in PhD program in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies.
  • Familiar with machine learning algorithms, platforms and frameworks such as PyTorch and Jax.
  • Have basic understanding of how GPU and/or ASIC works.
  • Expert in at least one or two programmingf languages in Linux environment: C/C++, CUDA, Python.
  • Must obtain work authorization in country of employment at the time of hire, and maintain ongoing work authorization during employment.

Preferred Qualifications

The following experiences will be a big plus:

  • GPU based high performance computing, RDMA high performance network (MPI, NCCL, ibverbs).
  • Distributed training framework optimizations such as DeepSpeed, FSDP, Megatron, GSPMD.
  • AI compiler stacks such as torch.fx, XLA and MLIR.
  • Large scale data processing and parallel computing.
  • Experiences in designing and operating large scale systems in cloud computing or machine learning.
  • Experiences in in-depth CUDA programming and performance tuning (cutlass, triton).

By submitting an application for this role, you accept and agree to our global applicant privacy policy, which may be accessed here: https://jobs.bytedance.com/en/legal/privacy.

Read Full Description
Confirmed 20 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles