Applied Data Scientist - Applied AI

TikTok

Responsibilities

Data Cycling Center (DCC) is a Data Science team that develops AI-driven content (unstructured data) understanding capabilities, identifies business opportunities from the understanding, and builds products and solutions to capture those opportunities. Our mission is to simplify the acquisition and utilization of unstructured/unlabeled data. The team act as the data modeling factory, using and analyzing mass data and finding useful insights for business growth. About the Role: We are looking for experienced data scientists to join our team and apply advanced analytics and machine learning techniques-including Prompt Engineering (PE), multi-modal large language models (LLMs), computer vision (CV), natural language processing (NLP), and audio signal processing-to optimize intelligent labeling workflows and data products within TikTok's ecosystem. Your work will help improve user experience, enhance content integrity, and support data-driven strategic decision-making. You will collaborate closely with cross-functional teams across product, operations, and algorithms to build scalable, end-to-end Prompt Engineering and LLM workflows for intelligent content moderation and labeling applications. Key Responsibilities: • Collaborate with cross-functional stakeholders to gather and refine requirements for data labeling projects and identify opportunities for optimization through data-driven solutions. • Design and manage the full lifecycle of end-to-end data labeling and policy testing workflows — from aligning with business needs to deployment, iteration, and monitoring. • Establish and maintain a centralized knowledge base for Retrieval-Augmented Generation (RAG) systems, incorporating both structured (e.g., SOPs, guidelines) and unstructured (e.g., annotations, case logs) data to support LLM-based policy QA and labeling efforts. • Operationalize intelligent labeling pipelines leveraging Prompt Engineering, agent-based workflows, and labeling models to ensure availability of high-quality data for model training and policy evolution. • Translate complex policy documents into machine- and human-readable formats, support agent and PE strategy development, and evolve nuanced policy edge cases in sync with fast-changing regulatory or platform dynamics. • Apply multi-modal LLM techniques to extract latent signals from content that inform moderation strategies and highlight policy gaps. • Lead applied ML and data science research and experimentation to solve business-critical use cases. • Own the model lifecycle from data sourcing and preprocessing to training, deployment, and post-launch maintenance.

Qualifications

Minimum Qualifications: 1) Advanced degree (Master's or Ph.D.) in Statistics, Computer Science, Applied Mathematics, Data Science, or a related quantitative field. 2) Strong theoretical foundation in computer science, machine learning, and statistics, with industry experience in deep learning and at least one of the following: Prompt Engineering, LLMs, CV, NLP, or speech recognition. 3) In-depth experience in unsupervised learning, clustering algorithms, and pattern recognition from unstructured data such as text or video. 4) Strong experience with unsupervised learning, clustering algorithms, and extracting data insights from unstructured video format data, recognizing patterns, and developing models 5) Experience in data project management, and solid foundations of maths and algorithms 6) Expertise in SQL, Hive, Presto, or Spark, and experience with large-scale datasets; along with strong proficiency in Python and Deep Learning frameworks such as TensorFlow or PyTorch 7) Excellent communication and collaboration skills, with the ability to work effectively across global teams and stakeholders. Preferred Qualifications: • At least 3 years of experience in software development or model/data pipeline development, with hands-on experience applying LLM technologies (e.g., Test Time Scaling, Chain of Thought, Retrieval-Augmented Generation, Supervised Fine-Tuning) to real-world problems. • Deep understanding of data pipeline architecture, model development lifecycle, testing, and deployment. • Practical industry experience in applying prompt engineering and emerging Al techniques to address diverse business needs. • Demonstrated strong intellectual curiosity, excellent problem-solving skills, and advanced analytical abilities to deconstruct problems, identify root causes, and propose effective solutions.

Read Full Description
Confirmed 3 hours ago. Posted a day ago.

Discover Similar Jobs

Suggested Articles