Citadel has flagged the Site Reliability Engineer job as unavailable. Let’s keep looking.

Site Reliability Engineer

Kar Auction Services

Who We Are:

At OPENLANE we make wholesale easy so our customers can be more successful. 

  • We’re a technology company building the world’s most advanced—and uncomplicated—digital marketplace for used vehicles.
  • We’re a data company helping customers buy and sell smarter with clear, actionable insights they can understand and use.
  • And we’re an innovation company accelerating the future of wholesale remarketing through curiosity, collaboration, and an entrepreneurial spirit.

Our Values: 

Driven Waybuilders. We pursue challenges that inspire us to build, create and innovate.

Relentless Curiosity. We seek to understand and improve our customers’ experience.

Smart Risk-Taking. We transform risk into progress through data, experience, and intuition.

Fearless Ownership. We deliver what we promise and learn along the way.

The AFC (Automotive Finace Corporation and OPENLANE Brand) SRE plays a crucial role in maintaining and improving platform infrastructure ensuring high reliability, resiliency, performance & quality, and faster speed-to-market. You will develop tools and processes, taking a holistic view of system health into account, and will function as a liaison to the development, operations, and shared IT services teams. 

You will also be a release coordinator for production deployments. You’ll ensure clear identification of deployment artifacts, create and communicate a precise deployment plan, and facilitate deployment activities to ensure the highest level of quality and system availability. 

Responsibilities:

  • Responsible for the administration, support, development, integration, asset management, and documentation of infrastructure assets
  • Participates in all phases of the IT Service Lifecycle as necessary, conduct root cause analysis, apply corrective action plans, and creating incident post mortem reports
  • Gathering and analyzing metrics (system and application) for performance tuning and fault finding
  • Troubleshoot application and system faults, latency, and conduct performance analysis making enhancements or recommendations as necessary
  • Integrate monitoring, logging, and alerting tools to provide comprehensive observability into the system's behavior and performance
  • Execute reviews and planning for performance, uptime, disaster recovery, and capacity growth
  • Research, evaluate, develop, and implement new tools to improve infrastructure hardening, product development efficiency, and improved security posture
  • Monitor and manage cloud infrastructure costs
  • Participate in platform security remediation activities
  • Work closely with our scrum teams to develop and support CI/CD pipelines in Azure DevOps for delivering code to Prod and Non-Prod environments
  • Facilitate production deployment activities whenever necessary (business hours, evenings, and weekends) usually 2-5 deployments per month (but continuously evolving)
  • Develop and implement Disaster Recovery (DR) plans and participate in testing
  • Create runbooks and incident response procedures to quickly address and mitigate system failures
  • Participate in infrastructure and security audits, automating the collection of evidence where possible

Required skills:

  • 5+ years in-depth experience supporting and maintaining AWS infrastructure as an SRE or similar role (Required)
  • Extensive experience in AWS administration including but not limited to EC2, S3, Lambda, VPC, SNS, SQS, CloudFront, CloudWatch, DynamoDB, EFS/FSx, APIGateway, Route53, etc…
  • Possess proficiency in Networking concepts (HTTP/S, TCP/IP, DNS, Virtual Networks (VNet, VPC), Subnets, Routing, Firewalls, and Network Security, triaging packet loss, SSL cert management, etc)
  • Experience configuring Continuous Integration / Continuous Deployment / Release on Demand (CI/CD/RoD) processes and tools
  • Experience managing both Windows and Linux servers
  • Knowledge of Database infrastructure or DBA experience (Oracle, SQLServer, DynamoDB)
  • Experience with containerization services (Docker, Kubernetes)
  • Experience developing and maintaining Infrastructure as Code (IaC) using tools such as Terraform and CloudFormation (Required)
  • Experience implementing and managing infrastructure monitoring tools such as LogicMonitor
  • Experience with monitoring/observability tools (Splunk)
  • A proactive approach to spotting problems, areas of improvement, and bottlenecks
  • Understanding of API development and troubleshooting
  • Process Automation
  • Experience with scripting languages: python, bash, javascript 
  • Ability to communicate with people of varying levels of technical ability
  • Strong communication (both written and verbal) and collaboration skills
  • Very strong problem-solving skills

Additional

  • Must be available to participate in the on-call rotation outside of core work hours to support issues and deployments as needed 
  • Must be able to work in a hybrid/remote work environment

What We Offer: 

  • Salary range of $80,000-115,000 depending on experience, skill set, qualifications, and other relevant factors.
  • Medical, dental, and vision benefits with employer HSA contributions (US) and FSA options (US)
  • Immediately vested 401K (US) or RRSP (Canada) with company match 
  • Paid Vacation, Personal, and Sick Time
  • Paid maternity and paternity leave (US)
  • Employer-paid short-term disability, long-term disability, life insurance, and AD&D (US)
  • Robust Employee Assistance Program
  • Employer paid Leap into Service Day to volunteer 
  • Tuition Reimbursement for eligible programs
  • Opportunities to expand your skill set and share your knowledge across a publicly traded, global organization
  • Company culture of internal promotions, diverse career paths, and meaningful advancement 
Read Full Description
Confirmed 20 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles