Job Details
- Job Title: Senior MLOps/GenAI Infrastructure Engineer
- Location: London / Salford / Glasgow / Newcastle / Cardiff (This is a hybrid role and the successful candidate will balance office working with home working)
- Band: D
- Salary: up to £59,600 - £69,800 (The expected salary range for this role reflects internal benchmarking and external market insights.)
We’re happy to discuss flexible working. Please indicate your preference under the flexible working question in the application. There’s no obligation to raise this at the application stage, but if you wish to do so, you’re welcome to. Flexible working will be part of the discussion at offer stage.
Purpose Of The Role
Step into the world of the BBC, one of the UK's most iconic and beloved brands, where every working day is as unique as it is rewarding. Every tick of the clock, our content reaches millions of people globally, which is made possible by our top-notch Software Engineering team. They've been instrumental in pioneering innovative products and unique features that have firmly positioned us at the forefront of our industry. We don't merely adapt to an ever-changing world - we set the pace.
With this role you'll be at the heart of an exciting journey, crafting tools and patterns that are state-of-the-art and transformative. We are the catalysts, enabling the creation and collaboration of cutting-edge ML and AI technologies. Our work is pivotal in shaping the BBC's future, empowering teams across the organisation to explore, innovate, and redefine the landscape of media. Our team is building out new tools and capabilities to accelerate data science activities and the development of ML/GenAI applications. We enable teams across the BBC to build, collaborate on, manage, and maintain their machine learning platforms at scale.
You will play a key role in driving our ambition to build an outstanding software engineering team, environment, and culture. We are looking for a Senior Engineer to join our tech community to drive this transformation, build a modern digital ecosystem using exciting technologies and do the best work of their careers.
Your Key Responsibilities And Impact
- Designing, developing, and maintaining tools that support data science and MLOps/LLMOps workflows.
- Collaborate with Data Scientists to deploy, serve, and monitor LLMs in real-time and batch environments using Amazon SageMaker, Bedrock
- Implement Infrastructure-as-Code with AWS CDK, CloudFormation to provision and manage cloud environments.
- Build and maintain CI/CD pipelines using GitHub Actions, AWS CodePipeline, CodeBuild, Jenkins.
- Integrate monitoring and observability tools such as AWS CloudWatch, Prometheus, Grafana for infrastructure and model health tracking.
- Ensure software quality through Test-Driven Development (TDD), unit testing frameworks (e.g., pytest, unittest), and automated integration tests.
- Conduct regular code reviews, participate in pair programming, and advocate for clean code, modular design, and maintainable architecture.
- Collaborate with architects and stakeholders to design high-level system architecture for cloud-first, AI-integrated products.
- Enforce security best practices (IAM, encryption, VPC configuration, audit logging) using AWS native services and third-party tools.
- Embed security throughout the software development lifecycle by integrating static and dynamic code analysis, vulnerability scanning, and policy-as-code tools into CI/CD pipelines—ensuring DevSecOps principles are applied from design to deployment.
- Promote a culture of continuous learning and knowledge-sharing through comprehensive documentation, technical deep dives, brown bag sessions, internal workshops, and active mentorship of team members.
Your Skills And Experience
- Extensive experience of DevOps/MLOps experience with a strong focus on building and delivering scalable infrastructure for ML and AI applications using Python and cloud native technologies
- Experience with cloud services, especially Amazon Web Services (AWS) – SageMaker, Bedrock, S3, EC2, Lambda, IAM, VPC, ECS/EKS.
- Proficiency in Infrastructure-as-Code using AWS CDK or CloudFormation.
- Experience implementing and scaling MLOps workflows with tools such as MLflow, SageMaker Pipelines.
- Proven experience building, containerising, and deploying using Docker and Kubernetes.
- Hands-on experience with CI/CD tools (GitHub Actions, CodePipeline, Jenkins) and version control using Git/GitHub.
- Strong understanding of DevOps concepts including blue/green deployments, canary releases, rollback strategies, and infrastructure automation.
- Familiarity with security and compliance practices for cloud-hosted applications.
- Excellent debugging, troubleshooting, and optimisation skills across the stack.
- Solid understanding of machine learning lifecycle and serving LLMs in production environments.
Preferred Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- Certifications such as AWS Certified DevOps Engineer, AWS ML Engineer, AWS Certified Solutions Architect,
- Strong communication skills, with experience working in Agile teams (Scrum, Kanban) and cross-functional collaboration.
- Contributions to open-source GenAI, MLOps, or LLMOps projects or communities is a strong plus.
Read Full Description