VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology

DBS Bank

VP, Problem & Knowledge Management Lead, SRE & Governance, Group Technology - (2500006V)

The Role:

This position is for an SRE Problem and Knowledge Management Team Lead within the enabling group, Site Reliability Engineering and Governance (SRE & Governance) department.

This role is expected to strategically lead the conduct of incident retrospective/ problem management operations and in other SRE activities in general which pertains to maintenance management that includes availability, performance, change management, monitoring, capacity planning & also the solutions offered derived from emergency response.

The Team Lead is to make sure that the retrospective activities are orchestrated & carried out effectively while promoting the blameless culture in accordance with the SRE principles.

Responsibilities:

  • Mentor the team in the seamless facilitation & conduct of root cause analysis (RCA) activities from end to end
  • Lead the facilitation for high-severity incidents liaising with top/ senior management and keeping the latter updated
  • Prime focal point for presenting in the RCA Forum, Tech Risk Forum and other senior management meetings to report updates on retrospective findings & action plans
  • Absorb new technology rapidly & apply effectively
  • Communicate well with technical & non-technical colleagues
  • Work to a high standard with agreed timescales
  • Undertake any other tasks or duties that are reasonable & requested by the supervisor or a member of the senior management team.
  • Do resource management to ensure problem management activities are carried out in an effective and efficient manner
  • Provide available platforms and channels to ensure stakeholders are kept updated on results of retrospectives and RCA activities
  • Able to demonstrate authority in the problem management calls.
  • Point of contact for assigned incidents of higher severity (from incident retrospective calls all the way up to Management Report (MR) documentation and publishing
  • Take accountability for initiatives on the enhancement activities related to SRE as a result of retrospectives
  • Collaborates with Engineering Teams within SRE and with LOBs on enabling activities as part of the preventive measures

Requirements:

  • Minimum 15 years of process improvement/ root cause analysis (RCA) exposure & involvement leading discussions as a problem manager or incident commander, preferably in the Technology & Operations space
  • Experience with JIRA, Confluence, Jenkins, Nexus, SonarQube, Bit bucket, S3, Cloud Computing.
  • Good exposure to logging & monitoring tools like Dynatrace, Prometheus, Grafana, ELG/ELK
  • In depth understanding of Incident & Problem Management functions & activities (i.e. Hardware- & Software-related incident & problem management)
  • Work with stakeholders & command centre in trouble shooting, escalating & solutioning critical site incidents.
  • Identify recurring system/ application issues & work with cloud team, infra teams, product development, vendors & other stakeholders in investigating & resolving cause
  • Maintain accurate documentation of incidents including impact details, timelines, steps taken for mitigation/resolution.
  • Strong verbal & written communication skills particularly effective documentation skills
  • Min 10+ yrs of software development or technical support or operations experience.
  • Basic knowledge of Linux, AIX, Solaris and Windows
  • Exposure to Enterprise databases e.g Oracle, SQL server, Maria DB, MongoDB & Sybase.
  • Knowledge in systems & multi-tier application & network troubleshooting
  • Essential knowledge & awareness of Public/Private/Hybrid cloud solutions.

Primary Location

: Singapore

Job

: Technology

Job Posting

: Jun 24, 2025, 4:38:10 AM

Read Full Description
Confirmed 11 hours ago. Posted 5 days ago.

Discover Similar Jobs

Suggested Articles