Site Reliability Engineering (SRE) Internship - Summer 2018


Quantopian is looking for a Site Reliability Engineering (SRE) intern to help us this summer. Quantopian empowers technical, talented people everywhere to write high-quality trading algorithms, and we're seeking engineering interns to support our rapidly expanding user base and build towards our ambitious product roadmap.

The SRE team at Quantopian manages the full cloud infrastructure platform that all of our products and services run on. We oversee code deployments, monitoring and alarms systems, databases, servers, containers, test infrastructure and more.

We work on interesting problems, such as:

  • Running arbitrary user code on our own servers, with all the associated security implications
  • Designing and building intuitive and powerful research and development tools and APIs for our users
  • Designing data stores for real-world financial data and optimizing them for high throughput when running trading simulations
  • Metering and autoscaling our cloud infrastructure to respond to varying load
  • Creating infrastructure to power a delightful user experience for both internal and external users

So far, we've built Quantopian with Python and Ruby on Rails on AWS and Heroku. We depend heavily on Redis, Docker, Postgres, and lots of Ansible. We have projects in-flight to incorporate Kubernetes and Airflow. However, we are very pragmatic, and our highest priority is shipping user-delighting features built with the most sensible technologies.

We also try to give back to the technology community on which we rely. Be sure to check out Zipline, our open-source backtester, as well as pyfolio for portfolio and risk analytics, and alphalens for predictive factor analysis. We have several other projects that we’ve open-sourced as well.

We're well-financed by highly reputable venture capital investors, including Spark Capital, Khosla Ventures, Bessemer Venture Partners, and Andreessen Horowitz.

We've assembled a top-notch product and engineering team here in Boston, and we're still growing. A financial background is not necessary at all (but is always nice to have).

We're still small enough that you will have a material impact on our company's trajectory, even over the course of a summer. Our small size and ambitious goals dictate our approach to talent acquisition and retention: we believe in hiring friendly, motivated engineers, putting them in a positive, collaborative environment, and giving them hard problems to solve and the autonomy to solve them.

Ideally, you:

  • have some prior programming experience, either as coursework or in industry;
  • have good written communication skills and an interest in explaining new technical concepts to a wider audience;
  • have an interest in administering and deploying cloud-scale systems;
  • thrive on designing, building, and shipping mind-blowing features that delight our users and developers;
  • will enjoy our fun and intellectually stimulating work environment. We ship early and often. We have lives outside of work. We like each other.

You do not need to have:

  • A background in finance, quant finance, wall street, or any other financial markets
  • A Computer Science major or minor track (prior programming experience of some kind is highly recommended)
  • Experience with SRE, DevOps, or operations processes
  • A pager or cell phone. While this is not an on-call position, you will be invited to our team Incident Review meetings and any Post Mortem/5-Whys meetings. 

Project Proposals

Below are three potential projects you could work on this summer - there may be other opportunities by the time you start as well!

Drive adoption of serverless technologies

Serverless technologies (sometimes referrred to as Functions-As-A-Service or FAAS) are a recent addition to the cloud operational toolkit. The premise of serverless technologies is that you define a function that takes some form of input and you ask your cloud provider to run it basically however they determine is most efficient. These inputs will be provided either by other applications (for example, an application could put work into a queue and the serverless platform will run your function once for each message), by signals from other AWS products, or from web requests.

We'd like someone to help us find good use cases for serverless here at Quantopian and run the entire implementation. Some of our initial ideas:

  • Automated alerting on any changes to permissions within our AWS account
  • Processing of our detailed billing reports to let us know if we're experiencing spikes in usage
  • Running basic security checks on every item uploaded to some of our S3 Buckets

By the end of this project, you will have...

  • Presented to the SRE team best practices and guidelines for using serverless technologies
  • Implemented production code running on AWS
  • Written detailed sample code and templates for us to continue utilizing serverless after you leave

Tools you'll be using:

  • Amazon Lambda
  • Amazon Cloudformation, Cloudwatch, S3, and other AWS products
  • Python
  • Kubernetes

Modernize our code testing infrastructure

Quantopian has utilized Jenkins for our testing service for as long as we've been a company. It has served us well, but as we've grown, we've not given a lot of investment into improving it. We'd like you to help us improve it in several ways:

  • Improve build isolation by running builds in reusable containers
  • Unlock parallel builds for our primary applications ("qexec" and "QF")
  • Design and implement a system for automatically using workers to run multiple builds on multiple hosts as needed
  • Setting up automatic "canary" deployments of new code alongside existing code
  • ... and potentially many other enhancements!

By the end of this project you will have...

  • Radically improved our build stability and throughput
  • Worked directly with our application engineers to improve their test behavior
  • Written real production deployment code and systems for running our build and test infrastructure

Tools you will be using:

  • Jenkins
  • Docker
  • Ansible
  • Various databases (MongoDB, Postgresql)
  • Python and Ruby-on-Rails
  • Python packaging systems (wheels, pipenv, etc.)
  • Amazon EC2 and CloudFormation

Assisting with Kubernetes rampup

We are currently in the early stages of a large project to adopt Kubernetes at Quantopian. This is, of course, an enormous project. Our goal is to have a production-grade container system alive, tested, and running real production code by the end of the summer. There are several different ways that you could contribute to this effort:

  • Helping SRE with load- and chaos- testing of our initial cluster setup
  • Initial refactoring of production applications to support deployment in a container
  • Set up monitoring and alarm systems for applications running in containers
  • Help with security hardening of our cluster with an eye towards supporting a better security system for our backtesting engine
  • Gathering feedback on development workflows for the application engineers and improving them
  • Using Kubernetes as a distributed time-based (cron) scheduler
  • Helping configure auto-scaling batch jobs of various types in our cluster
  • Integrating some of our workflows using Airflow with Kubernetes

By the end of your internship, you will have...

  • Written container deployment code for production applications at Quantopian
  • Hands-on experience with Kubernetes

Tools you will be using:

  • Kubernetes
  • Docker
  • Python
  • A large suite of Amazon Web Services products (EC2, CloudFormation, etc.)
  • Hashicorp Vault
Read Full DescriptionHide Full Description
Confirmed 10 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles