Research Scientist, Infrastructure System Lab

ByteDance

Responsibilities

About the Team

We are the Infrastructure System Lab — a hybrid research and engineering group building the next-generation AI-native data infrastructure. Our work sits at the intersection of databases, large-scale systems, and AI. We drive innovation across:

  • Next-generation databases: We build VectorDBs and multi-modal AI-native databases designed to support large-scale retrieval and reasoning workloads.
  • AI for Infra: We leverage machine learning to build intelligent algorithms for infrastructure optimization, tuning, and observability.
  • LLM Copilot: We develop LLM-based tooling like NL2SQL, NL2Chart.
  • High-performance cache systems: We develop a multi-engine key-value store optimized for distributed storage workloads. We're also building KV caches for LLM inference at scale.

This is a highly collaborative team where researchers and engineers work side-by-side to bring innovations from paper to production. We publish, prototype, and build robust systems deployed across key products used by millions.

About the Role

We are seeking a highly motivated and technically strong Research Scientist with a PhD in Computer Science, Database, Information Retrieval, or a related field to join our team. You will work on designing and optimizing state-of-the-art vector indexing algorithms to power large-scale similarity search, filtered search, and hybrid retrieval use cases.

Your work will directly contribute to the next-generation vector database infrastructure that supports real-time and offline retrieval across billions or even trillions of high-dimensional vectors.

Why Join Us

  • Work on problems at the frontier of AI x systems with huge practical impact.
  • Collaborate with a world-class team of researchers and engineers.
  • Opportunity to publish, attend conferences, and contribute to open-source.
  • Competitive compensation, generous research support, and a culture of innovation.

Responsibilities

  • Research and develop new algorithms for approximate nearest neighbor (ANN) search, especially for filtered, hybrid, or disk-based scenarios.
  • Optimize existing algorithms for scalability, low latency, memory footprint, and hybrid search support.
  • Collaborate with engineering teams to prototype, benchmark, and productionize indexing solutions.
  • Contribute to academic publications, open-source libraries, or internal technical documentation.
  • Stay current with research trends in vector search, retrieval systems, retrieval-augmented generation (RAG), large language models (LLMs), and related areas.

Qualifications

Minimum Qualifications

  • PhD in Computer Science, Applied Mathematics, Electrical Engineering, or a related technical field.
  • Strong publication record in top-tier venues (e.g., SIGMOD, VLDB, SIGIR, NeurIPS, ICML, etc.) related to vector search, indexing, IR, or ML.
  • Deep understanding of ANN algorithms, quantization, graph-based indexes, and partition-based indexes.
  • Strong system-level thinking: ability to profile, benchmark, and optimize performance across CPU, memory, and storage layers.
  • Proficiency in C++ and/or Python, with experience in implementing and benchmarking algorithms.

Preferred Qualifications

  • Experience building or contributing to vector databases or retrieval engines in production.
  • Familiarity with frameworks like FAISS, ScaNN, HNSWLib, or DiskANN.
  • Understanding of distributed systems and/or GPU-accelerated search.
  • Experience with hybrid search (dense + sparse), multi-modal retrieval, or retrieval for LLMs.
  • Passion for bridging theory and practice in production-scale systems.
Read Full Description
Confirmed 20 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles