AI Model Optimization Engineer

ByteDance

Responsibilities

Team Introduction

The Intelligent Creation - AI Platform team is a team focusing on building advanced end-to-end AI production pipelines, including deep learning model training, optimization, deployment and applications. We provide AI capabilities to empower content creation and consumption on TikTok and serve billions of users.

We are seeking an experienced AI model optimization engineer with expertise in optimizing AI model training and inference, including distributed training/inference and acceleration. The ideal candidate will work at the cutting edge of AI efficiency, enhancing the performance, scalability, and deployment of large-scale generative AI models.

Responsibilities

  • Optimize AI model training and inference workflows to improve efficiency, speed, and scalability.
  • Develop and implement distributed training strategies to accelerate model convergence and reduce computational overhead.
  • Design and optimize inference pipelines for low-latency, high-throughput deployments across diverse hardware architectures.
  • Benchmark and profile deep learning models to identify performance bottlenecks and optimize computational resources.
  • Improve model parallelism and memory efficiency for large-scale AI models.
  • Research and implement state-of-the-art techniques in model compression, quantization, and pruning.
  • Collaborate with data scientists, production engineers, and infrastructure teams to ensure seamless integration of optimized models into production environments.
  • Stay up to date with the latest advancements in AI model efficiency, distributed computing, and hardware acceleration.

Qualifications

Minimum Qualifications:

  • Master’s or PhD in Computer Science, Electrical Engineering, Artificial Intelligence, or a related field.
  • 5+ years of experience in AI model training and inference optimization.
  • Strong proficiency in deep learning frameworks such as PyTorch and JAX.
  • Experience with distributed training techniques such as data parallelism, model parallelism, and pipeline parallelism.
  • Solid understanding of model compression techniques, including quantization, pruning, and knowledge distillation.
  • Proficiency in high-performance computing (HPC), CUDA, OpenCL, or other hardware acceleration libraries.
  • Experience working with ML compilers (e.g., TensorRT, XLA, TVM, ONNX Runtime) to optimize model inference.

Preferred Qualifications:

  • Strong software engineering skills, including proficiency in Python, C++, and parallel computing.
  • Experience with large-scale distributed systems and orchestration tools such as Kubernetes, Ray, or Horovod.
  • Prior contributions to open-source AI optimization projects.
  • Background in research with publications in top-tier AI/ML conferences (e.g., NeurIPS, ICML, CVPR).
  • Knowledge in cutting-edge AIGC technologies and follow the most recent AIGC advancements, such as transformers and diffusion models.
Read Full Description
Confirmed 2 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles