Honeywell has flagged the C & PS Project Management Specialist job as unavailable. Let’s keep looking.

Specialist, Project Management, Lead Observability Management (SRE) - (WD63210)

Business Function

Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.

Job Purpose 

  • DBS Bank is looking for a Platform SRE Engineer with working experience on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform’s efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team’s productivity

Key Accountabilities

  • Responsible to designing and deploying new ELK clusters (Elasticsearch, Logstash, Kibana, beats, zookeeper etc. Proactively monitoring performance.
  • Infrastructure design for the Elasticsearch, Logstash and Kibana Ensure implementation meets security controls, comply with OS-level networking standards, control the access with the least privilege.
  • Managing the Cluster and integrate with Logstash & Elasticsearch
  • Design and develop data engineering pipelines.
  • Design and configure ETL data pipelines using Elastic Common Schema to onboard application logs and metrics Configure index templates and data life cycle management (ILM) for data retention
  • Automate repetitive tasks and optimize practices and perform thorough testing to ensure product quality.

Job Duties & responsibilities

  • Create and maintain software documentation.
  • Monitoring and proactive support including morning checks etc
  • Provide engineering solution and framework to support machine learning and data-driven business activities at large scale
  • Perform R&D on new technologies and solutions to improve accessibility, scalability, efficiency and abilities of machine learning and analytics platform.
  • Establish, apply and maintain best practices and principles of machine learning engineering.
  • Keep innovating and optimizing the machine learning workflow, from data exploration, model experimentation/prototyping to production.
  • Responsibility will be to Onboard applications into monitoring tools and perform production support for the platform.
  • Deployment, support and monitoring of existing and new services, and application stacks.
  • Automate repetitive tasks, optimize processes, and perform thorough testing to ensure quality.
  • Design and develop data engineering pipelines and manage data lifecycle policies.

Required Experience

  • Strong experience with the full ELK Stack - Elasticsearch, Logstash, Kibana, Beat agents, Machine Learning, APM, X-Pack and REST API integration.
  • Experience with developing in multiple languages (Python, Bash, Painless, or other scripting languages).
  • Develop Elastic alerting solutions using Watcher and Kibana or Grafana. Alerts that will have integrate into Teams and email.
  • Develop Machine Learning (ML) job to dynamically monitor and alert on specific metrics
  • Having basic knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL),
  • Experience in NodeJS, Spring boot and would be a plus.
  • Experience & skills in automation tools (e.g. Ansible) & DevOps pipelines are appreciated
  • Implement Site Reliability Engineering principles regarding performance, reliability, monitoring, alerting in Production environment
  • Self-driven, strong, committed, and reliable team player. Ability to contribute to discussions on design and strategy.
  • Good problem diagnosis and creative problem-solving skills

Education / Preferred Qualifications

  • Technology, Engineering or IT Bachelors degree.

Core Competencies

  • Working knowledge of Grafana, Prometheus, Confluent Kafka, Elastic stack (Elasticsearch / Logstash / Kibana / Beats) including data ingestion, management, monitoring & analytics. Able to perform L1/2 ELK related tasks.
  • Experience in designing and building highly scalable distributed ML models in production and then Create & deploy machine learning jobs for anomaly detection in IT eco Systems
  • Creating automated anomaly detection systems and constant tracking of its performance
  • Experience in anomaly detection or root cause analysis related to monitoring products is preferred.

Technical Competencies

  • Familiar with machine learning related development frameworks, such as ELK, PyTorch, etc., experience in practical application and optimization of algorithm projects
  • In-depth experience in Unix/Linux/Shell/Strong programming knowledge in Python and use Design patterns in development.
  • Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
  • Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL),

Work Relationship

  • All business units in India for application feature enhancement, support & maintenance
  • Internally with all IT heads, Leaders & members
  • International/local software developers/vendors.
  • Co-ordination with on-shore team to resolving the issue on-highest priority

DBS India - Culture & Behaviors

  • Drive Performance Through Value Based Propositions
  • Ensure Customer Focus by Delighting Customers & Reduce Complaints
  • Build Pride and Passion to Protect, Maintain and Enhance DBS’ Reputation
  • Enhance Knowledge Base, Build Skill Sets & Develop Competencies
  • Invest in Team Building & Motivation through Ideation & Innovation
  • Execute at Speed While Maintaining Error Free Operations
  • Develop a Passion for Performance to Grow Talent Pool
  • Maintain the Highest Standards of Honesty and Integrity

Primary Location

: India-Tamil Nadu-Chennai

Job

: Risk

Schedule

: Regular

Employee Status

Full-time

Job Posting

: Mar 27, 2024, 6:13:15 AM

Read Full Description
Confirmed 6 hours ago. Posted a day ago.

Discover Similar Jobs

Suggested Articles