Senior Site Reliability Engineer (AWS, Security), KMS Healthcare

KMS Technology

KMS Technology was established in 2009 as a U.S.-based software services company. With development centers in Vietnam and Mexico, we have been trusted globally for the superlative quality of our software consulting & development services, technology solutions, and engineers' expertise. We pride ourselves on creating brilliant solutions for our clients by leveraging deep expertise, advanced technologies, and delivery excellence for a shared success where everyone can reach their fullest potential. With three Business Lines:

  • KMS Software: Leverage software domain expertise to help clients make better business decisions in technology platforms, increase speed-to-market, and gain critical development support through innovative technology solutions.
  • KMS Solutions: Empower BFSI businesses to embrace the digital finance revolution and expedite clients’ journey towards complete digitalization, technology consulting, data analytics, software development, and software quality.
  • KMS Healthcare: Build transformative next-gen technologies to solve healthcare’s most challenging problems, providing innovative tools and expertise to providers, payers, life sciences, and medical technology vendors.

Responsibilities

  • Design, deploy, and maintain scalable, reliable AWS infrastructure
  • Automate infrastructure management using IaC tools (Terraform, CloudFormation, Ansible)
  • Optimize system performance, capacity planning, and incident management through best practices and automation.
  • Lead incident response, root cause analysis (RCA), and postmortem processes.
  • Manage and optimize AWS services for performance and cost efficiency
  • Develop and manage DataDog dashboards, metrics, and alerts to monitor system health, analyze performance, and support infrastructure optimization
  • Work with development, DevOps, and IT teams to boost system reliability and efficiency, and ensure thorough documentation of architecture, monitoring, and incident workflows.

Qualifications

General requirements:

  • Upper-Intermediate level of English level
  • Ability to effectively consult with clients to understand their needs, propose tailored solutions, and persuasively communicate their value to gain approval
  • Ability to obtain deep knowledge of the project technologies and work independently with minimum guidance
  • Ability to handle multiple tasks, communicate effectively with team members and clients
  • Strong logical thinking and problem solving skills
  • Ability to self-learn and adapt to new technologies quickly

Technical requirements:

  • 3+ years of experience as a Site Reliability Engineer, DevOps Engineer, or similar role
  • Experienced in complex systems and design scalable solutions for operational excellence
  • Extensive hands-on experience with AWS cloud infrastructure (EC2, S3, RDS, Lambda, CloudWatch, CloudTrail)
  • Proficient in using DataDog for APM, infrastructure dashboards, and alert configurations
  • Familiarity with NOC environments and incident management protocols
  • Strong experiences in networking concepts (DNS, Load Balancers, VPNs, etc.) and cloud security best practices
  • Capable of deploying detection rules in CloudTrail and configuring logs for enhanced security insights
  • Proficiency with Infrastructure as Code (IaC) tools (Terraform, CloudFormation).
  • Strong scripting skills in languages such as Python, Bash, or similar
  • Solid understanding of monitoring, logging, and alerting tools (Prometheus, Grafana, ELK stack, or similar)
  • Experience with containerization (Docker) and orchestration tools (Kubernetes, ECS)
  • Able to identify incident types, follow response checklists, escalate appropriately, and document incidents clearly

Benefits and Perks

BENEFITS & PERKS

  • Working in one of the Best Places to Work in Vietnam, Top 10 ITC Company in Vietnam
  • Flexible working model: Flexible time & Hybrid working from Ho Chi Minh or Da Nang city or working remotely from any location in Vietnam
  • Attractive Salary & Benefits, full salary in probation, social insurance on full gross salary
  • Performance appraisal twice a year, 13th-month salary and performance bonus
  • Premium healthcare insurance for you and your loved ones
  • Working 5 days/week , from Monday to Friday
  • 18+ paid leave days/year
  • Diverse careers opportunities with Software Services, Software Product Development
  • Working and growing in a values-driven, international working environment and standard Agile culture with passionate and talented teams
  • Onsite opportunities: short-term and long-term assignments in U.S
  • Various training on hot-trend technologies, best practices and soft skills
  • Company trip, big annual year-end party every year, team building, etc.
  • Fitness & sports activities: football, tennis, table tennis, badminton, yoga, swimming…
  • Joining community development activities: 1% Pledge, charity every quarter, blood donation, public seminars, career orientation talks,…
  • Free in-house entertainment facilities (football, ping pong, gym…), coffee, and snacks (instant noodles, cookies, candies…)

And much more, join us and let yourself explore other fantastic things!

Read Full Description
Confirmed 13 hours ago. Posted 30+ days ago.

Discover Similar Jobs

Suggested Articles