Sr. Staff AI Research Scientist - Generative Models (Speech & Multimodal)

Rivian Automotive

About Us

Rivian and Volkswagen Group Technologies is a joint venture between two industry leaders with a clear vision for automotive’s next chapter. From operating systems to zonal controllers to cloud and connectivity solutions, we’re addressing the challenges of electric vehicles through technology that will set the standards for software-defined vehicles around the world.

The road to the future is uncharted. By combining our expertise across connectivity, AI, security and more, we’ll map a new way forward. Working together, we’ll create a future that’s more connected, more intelligent, more sustainable for everyone.

Role Summary

We are seeking a Research Scientist specializing in Generative AI with a focus on advanced speech models (ASR, TTS, Speech-to-Speech, SpeechLM) and diverse multimodality models (vision, video, speech, text, etc.). In this role, you will be a key contributor to our AI research and development efforts, working on cutting-edge advancements in natural language processing (NLP), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Speech Language Model (SpeechLM), Vision Language Model (VLM), audio analysis and understanding, computer vision, and multimodal applications. You will collaborate to design, build, and optimize innovative AI solutions that are safe, aligned, and effective for complex conversational and multimodal systems.

Responsibilities

  • Research and develop state-of-the-art generative AI models, with a strong emphasis on ASR, TTS, speech-to-speech, SpeechLM, audio reasoning, understanding, and diverse multimodal models involving vision, video, speech, and text.
  • Design and implement strategies for model alignment with ethical and safety standards, particularly for multimodal and speech applications with advanced reasoning capabilities.
  • Train and fine-tune large-scale audio, speech, vision, and multimodal models on domain-specific datasets, including datasets for audio and speech understanding.
  • Design and run experiments to explore new approaches in generative modeling for complex speech and multimodal data, including audio reasoning tasks.
  • Manage and preprocess large datasets, including audio, speech, image, video, and text data, for enhanced model generalizability and specifically to support audio, speech, and video understanding.

Qualifications

  • MS or PhD in Computer Science, Machine Learning, NLP, Speech Processing, Audio Processing, Computer Vision, or a related field.
  • Proven experience with transformer models, NLP pipelines, speech processing techniques, audio analysis, computer vision, and multimodal generative models.
  • Knowledge of model alignment techniques and responsible AI practices, especially in the context of speech, multimodal AI, audio and video understanding and reasoning.
  • Proficiency in Python, and ML frameworks such as PyTorch or TensorFlow, as well as experience with audio, image, and video processing libraries.
  • Solid understanding of machine learning, NLP, speech processing, audio analysis, and computer vision fundamentals.
  • Ability to explore, prototype, and evaluate novel techniques in generative AI, alignment, and multimodal processing, including audio reasoning tasks.

Preferred Qualification

  • Experience with distributed model training and inference, particularly for large-scale speech, vision, and multimodal models.
  • Experience with academic research, peer-reviewed publications, or contributions to open-source projects in the field of NLP, speech processing, audio analysis, computer vision, or multimodal AI.
  • Familiarity with LLM fine-tuning and alignment frameworks, as well as speech and multimodal model fine-tuning
  • Strong understanding of Speech Language Model (SpeechLM), Vision Language Models (VLMs) and their applications in multimodal tasks, and in conjunction with audio/speech inputs.

Pay Disclosure

Salary Range/Hourly Rate for California Based Applicants: 265,000 USD - 331,300 USD (actual compensation will be determined based on experience, location, and other factors permitted by law).

Benefits Summary: Rivian and Volkswagen Group Technologies provides robust medical/Rx, dental and vision insurance packages for full-time and part-time employees, their spouse or domestic partner, and children up to age 26. Full Time Employee coverage is effective on the first day of employment. Part-Time employee coverage is effective the first of the month following 90 days of employment.

Company Statements

Equal Opportunity

Rivian and Volkswagen Group Technologies is committed to creating a diverse environment and is proud to be an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law. We are also committed to ensuring compliance with all applicable fair employment practice laws regarding citizenship and immigration status.

Rivian and Volkswagen Group Technologies is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at candidateaccommodations@rivian.com.

Candidate Data Privacy

Rivian and VW Group Technologies (“Rivian and Volkswagen Group Technologies”) may collect, use and disclose your personal information or personal data (within the meaning of the applicable data protection laws) when you apply for employment and/or participate in our recruitment processes (“Candidate Personal Data”). This data includes contact, demographic, communications, educational, professional, employment, social media/website, network/device, recruiting system usage/interaction, security and preference information. Rivian and Volkswagen Group Technologies may use your Candidate Personal Data for the purposes of (i) tracking interactions with our recruiting system; (ii) carrying out, analyzing and improving our application and recruitment process, including assessing you and your application and conducting employment, background and reference checks; (iii) establishing an employment relationship or entering into an employment contract with you; (iv) complying with our legal, regulatory and corporate governance obligations; (v) recordkeeping; (vi) ensuring network and information security and preventing fraud; and (vii) as otherwise required or permitted by applicable law.

Rivian and Volkswagen Group Technologies may share your Candidate Personal Data with (i) internal personnel who have a need to know such information in order to perform their duties, including individuals on our People Team, Finance, Legal, and the team(s) with the position(s) for which you are applying; (ii) Rivian and Volkswagen Group Technologies affiliates; and (iii) Rivian and Volkswagen Group Technologies’ service providers, including providers of background checks, staffing services, and cloud services.

Rivian and Volkswagen Group Technologies may transfer or store internationally your Candidate Personal Data, including to or in the United States, Canada, and the European Union and in the cloud, and this data may be subject to the laws and accessible to the courts, law enforcement and national security authorities of such jurisdictions.

Please see our Candidate Data Privacy Notice (English) and Candidate Data Privacy Notice (Serbian) for more information.

Please note that we are currently not accepting applications from third party application services.

Read Full Description
Confirmed 33 minutes ago. Posted 11 days ago.

Discover Similar Jobs

Suggested Articles