Lead Engineer - Cloud Reliability

Dish Network

Education
Qualifications
Benefits
Skills

Company Summary

DISH Network Technologies India Pvt. Ltd is a technology subsidiary of EchoStar Corporation. Our organization is at the forefront of technology, serving as a disruptive force and driving innovation and value on behalf of our customers.

Our product portfolio includes Boost Mobile (consumer wireless), Boost Mobile Network (5G connectivity), DISH TV (Direct Broadcast Satellite), Sling TV (Over The Top service provider), OnTech (smart home services), Hughes (global satellite connectivity solutions) and Hughesnet (satellite internet).

Our facilities in India are some of EchoStar’s largest development centers outside the U.S. As a hub for technological convergence, our engineering talent is a catalyst for innovation in multimedia network and communications development.

Summary

Our Technology teams challenge the status quo and reimagine capabilities across industries. Whether through research and development, technology innovation or solution engineering, our people play vital roles in connecting consumers with the products and platforms of tomorrow.

Job Duties and Responsibilities

  • System Reliability & Performance:
    • Design, implement, and maintain monitoring, alerting, and logging solutions for webMethods, GemFire, AWS services, and Kubernetes clusters to proactively identify and resolve issues.
    • Develop and implement automation for operational tasks, incident response, and system provisioning/de-provisioning.
    • Participate in on-call rotations to respond to critical incidents, troubleshoot complex problems, and perform root cause analysis (RCA).
    • Identify and eliminate toil through automation and process improvements.
    • Conduct performance tuning and capacity planning for all supported platforms.
  • Platform Expertise:
    • webMethods: Support, maintain, and optimize webMethods Integration Server, Universal Messaging, API Gateway, and related components. Experience with webMethods upgrades, patching, and configuration management.
    • GemFire: Administer and optimize GemFire clusters, ensuring high availability, data consistency, and performance for critical applications. Troubleshoot GemFire-related issues, including cache misses, replication problems, and member failures.
    • AWS Cloud: Manage and optimize AWS cloud resources (EC2, S3, RDS, VPC, IAM, CloudWatch, Lambda, etc.) for scalability, security, and cost-efficiency.
    • Rancher Kubernetes: Administer, troubleshoot, and optimize Kubernetes clusters managed by Rancher. Experience with Helm charts, Kubernetes operators, ingress controllers, and network policies.
  • Collaboration & Best Practices:
    • Collaborate closely with development teams to ensure new features and services are designed for reliability, scalability, and observability.
    • Implement and champion SRE best practices, including SLO/SLA definition, error budgeting, chaos engineering, and blameless post-mortems.
    • Develop and maintain documentation for systems, processes, and runbooks.
    • Mentor junior engineers and contribute to a culture of continuous learning and improvement.

Skills - Experience and Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • 8+ years of experience in an SRE, DevOps, or highly technical operations role.
  • Deep expertise in at least two, and strong proficiency in all, of the following:
    • webMethods Integration Platform (Integration Server, Universal Messaging, API Gateway).
    • VMware GemFire (or other distributed in-memory data grids like Apache Geode, Redis Enterprise).
    • AWS cloud services (EC2, S3, RDS, VPC, CloudWatch, EKS etc.).
    • Kubernetes administration, particularly with Rancher and EKS.
  • Strong scripting and programming skills: Python, Go, Java, Bash.
  • Experience with Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
  • Proficiency with CI/CD pipelines (e.g., Jenkins, GitLab CI, AWS CodePipeline).
  • Experience with monitoring and logging tools (e.g., Dynatrace, Prometheus, Grafana, ELK Stack, Datadog, Splunk).
  • Solid understanding of networking concepts (TCP/IP, DNS, Load Balancing, VPNs).
  • Excellent problem-solving, analytical, and communication skills.
  • Ability to work effectively in a fast-paced, collaborative environment.

Nice to have skills

  • Experience with other integration platforms or message brokers.
  • Knowledge of other distributed databases or caching technologies.
  • AWS Certifications.
  • Kubernetes Certifications (CKA, CKAD, CKS).
  • Experience with chaos engineering principles and tools.
  • Familiarity with agile methodologies.
Read Full Description
Confirmed 23 hours ago. Posted 2 days ago.

Discover Similar Jobs

Suggested Articles