Production Support Lead - (WD73311)
As production support engineer team lead, you act as support expert who holds the responsibility of ensuring company’s systems are stable and meet quality standards. You will lead a team to monitor systems status and troubleshoot, resolve software and system issues. Ultimately support engineer team need to ensure smooth operations of critical systems and software applications.
Job Duties:
- Monitoring and Maintaining: Build, maintain, enhance and oversee monitoring systems for critical systems and software in production. Configure, troubleshoot and support servers, load balancers, storage arrays, network attached storage, network equipment and related peripherals
- Managing incidents: React with issues and tickets always in timely fashion, Identify, diagnose and find ways to resolve the issue
- Troubleshooting issues: Analyse various metrics and logs to understand the reason for system failure or non-performance. Using different debugging and diagnostic tools, find solutions to improve software performance.
- Collaborating with other departments: collaborate with various cross-functional teams and work with developers, system administrators and software engineers to solve production issues. Update senior leaders about the system status and share timely resolution progress.
- Contributing to product/systems performance: participate in every phase of the product development process, such as designing, building and testing. Can create valuable tools, including internal software, to automate processes or platforms with scripts to improve efficiency.
- Suggesting improvements post-incidents: Based on large amount of feedbacks about products or systems, always respond directly to this by suggesting and implementing tailored improvements. Conscientiously identify potential improvement areas in the production system. Make suggestions, such as code optimisation and enhancement of the production infrastructure.
- Training new employees: guide new employees on using production support software packages
- Preparing and documenting reports: Prepare and maintain up-to-date documentation of systems and procedures, and other relevant information. Employers expect them to record every problem in the production environment.
- Team setup: Identify and choose right team members with skillsets and mindsets. Set up 24/7 on call rotation system to ensure systems are always taken care of.
Sense of Responsibility: Holds ultimate accountability of production support. Participate in 24/7 on-call rotation, respond to alerts in a timely fashion, escalate issues as needed.
Must have skillset:
- At least 10 years of relevant working experience in production support or similar role, preferably in Bank.
- Technical skills: A strong understanding of complex software products and systems is necessary. You have proficiency in script languages, operating systems, databases and networks. Knowledge of programming languages, such as Java and Python, and scripting languages, such as PowerShell or Perl, help you debug scripts. Proficiency in databases, such as SQL, is essential to resolve data-related issues and perform routine maintenance tasks.
- Problem Management: Have the ability and experience to analyse issues, identify underlying root causes, know how and where to gather information and support, take preventive measures and stop future system or product failures from occurring. This involves collaborating with other teams to develop and implement effective solutions.
- Adaptability: As the production work environment can be dynamic and unpredictable, you can adapt to a changing work environment. Dedication, flexibility and a willingness to learn is essential to support evolving technologies and systems. Given urgent requests and incidents, can immediately react and respond to support on time.
- Communication: You need to interact with various people in different departments. Excellent verbal and written communication skills (both Chinese and English, Cantonese language (a plus)) are necessary to convey technical information effectively and also to listen to users' concerns and feedback, and provide regular updates on incident management.
- Teamwork: You need to have the leadership mindset to share your knowledge with others and train up your team, most importantly, know how to set up a right working culture and system to motivate your team to the highest standards.
- Incident Management: Respond to and resolve incidents related to production systems, ensuring timely resolution and minimal disruption to business operations. This includes accurate logging, diagnosis, and escalation of issues as required.
- Change Management: Participate in the implementation of changes to production systems, ensuring adherence to change management processes and procedures. This includes testing and validation of changes prior to deployment.
- Process and Procedure Adherence: Strictly adhere to all established operational processes and procedures, ensuring compliance with internal policies and external regulations. This includes maintaining up-to-date documentation and knowledge of relevant procedures.
- Control Standard Compliance: Ensure all activities comply with the bank's stringent control standards, including security, audit, and regulatory requirements. This includes participation in internal audits and regulatory reviews.
- Expertise in Cloud, CI/CD, container, Messaging, DB
- Hands on experience on Kubernetes/OpenShift, Unix/Linux/Windows
- Agile practice experience is a must, strong understanding of ITIL frameworks and best practices.
- Leadership, self-driven, willing to lead and drive project and team.
- Ability to work under pressure, handle multiple tasks and work to tight deadlines.
Primary Location
: China-Guangzhou (DTC)
Job
: Technology
Schedule
: Regular
Employee Status
:
Full-time
:
Job Posting
: Apr 2, 2025, 8:30:46 AM
Read Full Description