Applied Al, Applied Scientist - Trust & Safety - San Jose

TikTok

Responsibilities

About the Role: We are looking for experienced data scientists to join our team and apply advanced analytics and machine learning techniques-including Prompt Engineering (PE), multi-modal large language models (LLMs), computer vision (CV), natural language processing (NLP), and audio signal processing-to optimize intelligent labeling workflows and data products within TikTok's ecosystem. Your work will help improve user experience, enhance content integrity, and support data-driven strategic decision-making. You will collaborate closely with cross-functional teams across product, operations, and algorithms to build scalable, end-to-end Prompt Engineering and LLM workflows for intelligent content moderation and labeling applications. Key Responsibilities: • Collaborate with cross-functional stakeholders to gather and refine requirements for data labeling projects and identify opportunities for optimization through data-driven solutions. • Design and manage the full lifecycle of end-to-end data labeling and policy testing workflows - from aligning with business needs to deployment, iteration, and monitoring. • Establish and maintain a centralized knowledge base for Retrieval-Augmented Generation (RAG) systems, incorporating both structured (e.g., SOPs, guidelines) and unstructured (e.g., annotations, case logs) data to support LLM-based policy QA and labeling efforts. • Operationalize intelligent labeling pipelines leveraging Prompt Engineering, agent-based workflows, and labeling models to ensure availability of high-quality data for model training and policy evolution. • Translate complex policy documents into machine- and human-readable formats, support agent and PE strategy development, and evolve nuanced policy edge cases in sync with fast-changing regulatory or platform dynamics. • Apply multi-modal LLM techniques to extract latent signals from content that inform moderation strategies and highlight policy gaps. • Lead applied ML and data science research and experimentation to solve business-critical use cases. • Own the model lifecycle from data sourcing and preprocessing to training, deployment, and post-launch maintenance.

Qualifications

Minimum Qualifications: • Advanced degree (Master's or Ph.D.) in Statistics, Computer Science, Applied Mathematics, Data Science, or a related quantitative field. • Strong theoretical foundation in computer science, machine learning, and statistics, with industry experience in deep learning and at least one of the following: Prompt Engineering, LLMs, CV, NLP, or speech recognition. • In-depth experience in unsupervised learning, clustering algorithms, and pattern recognition from unstructured data such as text or video. • Experience in data project management, and solid foundations of maths and algorithms • Expertise in SQL, Hive, Presto, or Spark, and experience with large-scale datasets; • Proficiency in Python and Deep Learning frameworks such as TensorFlow or PyTorch Preferred Qualifications: • At least 3 years of experience in software development or model/data pipeline development, with hands-on experience applying LLM technologies (e.g., Test Time Scaling, Chain of Thought, Retrieval-Augmented Generation, Supervised Fine-Tuning) to real-world problems. • Deep understanding of data pipeline architecture, model development lifecycle, testing, and deployment. • Practical industry experience in applying prompt engineering and emerging Al techniques to address diverse business needs. • Demonstrated strong intellectual curiosity, excellent problem-solving skills, and advanced analytical abilities to deconstruct problems, identify root causes, and propose effective solutions. • Proven track record of success in high-growth, fast-paced, and ambiguous environments. • Excellent communication and collaboration skills, with the ability to work effectively across global teams and stakeholders.

Read Full Description
Confirmed 22 hours ago. Posted a day ago.

Discover Similar Jobs

Suggested Articles