Manager Engineering[ Development, Sustenance]

Nutanix

Education
Benefits
Special Commitments
Skills

Hungry, Humble, Honest, with Heart.

The Opportunity

At Nutanix, we're building the future of intelligent observability with Panacea.ai—an AI/ML-powered platform that automatically detects, explains, and correlates anomalies across logs and metrics. In version 1.0, we leveraged regex-based filters to surface anomalies. Now, we're building Panacea.ai—powered by AI/ML, ModernBERT, and LLMs—to deliver intelligent, context-rich anomaly detection, build an enterprise-grade auto RCA engine, powered by ModernBERT, LLMs, and real-time metrics anomaly detection.

About the Team

We’re looking for a Technical Manager to lead this effort—a hands-on, high-impact role where you’ll manage and grow a team of engineers, drive AI/ML innovation, and help shape Nutanix’s central AI charter. This is your opportunity to lead cutting-edge product development at the intersection of observability, ML, and large-scale systems.

Why Join Us

  • Lead a high-impact team delivering AI-first observability tools that directly improve engineering velocity and product quality.
  • Tackle challenging technical and product problems at scale and speed.
  • Shape the foundational AI platform and practices across Nutanix.
  • Enjoy the flexibility of hybrid work, with a culture that values deep work, collaboration, and ownership.
  • Be part of a startup-style team backed by the scale, reach, and stability of a global cloud leader.

Your Role

  • AI-Powered Observability Platform: Own the vision, architecture, and delivery of Panacea’s ML-based log and metrics analyzer that reduces triage time and improves engineering efficiency.
  • Metrics Anomaly Detection: Guide development of models that detect anomalies in time-series metrics like CPU, memory, disk I/O, and network traffic, enabling early detection of performance regressions.
  • ModernBERT for Logs: Lead integration and optimization of transformer-based models like ModernBERT to extract meaning from complex logs, cluster anomalies, and summarize RCA insights.
  • Auto RCA Engine: Deliver an AI engine that correlates logs and metrics across distributed services, automatically explaining root causes of incidents.
  • Feedback Loop & Continuous Learning: Build infrastructure for incorporating user feedback to continuously retrain and improve anomaly detection systems.
  • LLM Integration: Integrate LLMs for user queries, problem summarization, anomaly explanation, and contextual recommendations.
  • Central AI Charter: Collaborate with other support and product teams to define foundational AI infrastructure, model lifecycle standards, and shared services across Nutanix.

Responsibilities

  • Lead and mentor a team of engineers focused on ML pipelines, backend systems, and AI platform components.
  • Drive the roadmap and execution for AI/ML features in Panacea—from ideation to customer delivery.
  • Oversee model development for log analysis and metrics anomaly detection, ensuring high accuracy, performance, and explainability.
  • Champion engineering excellence—own code reviews, system design, and operational best practices across the team.
  • Collaborate with cross-functional teams (SRE, Dev, QA, PM) to understand pain points and deliver impactful solutions.
  • Influence and contribute to Nutanix’s broader AI strategy, platform choices, and model governance.
  • Foster a collaborative, high-trust team environment that prioritizes growth, experimentation, and results.

What You Will Bring

  • Educational Background: B.Tech/M.Tech in Computer Science, Machine Learning, or a related field.
  • Experience: 10+ years in software engineering, with 2+ years in management roles,
  • AI/ML Expertise:
  • Experience building ML models for metrics data (CPU, memory, IOPS, network, etc.) using models like Isolation Forest, Prophet, LSTM, or deep autoencoders.
  • Expertise in NLP using ModernBERT, BERT, or log classification, clustering, and summarization.
  • Experience with LLMs for downstream tasks like summarization, root cause reasoning, or intelligent Q&A.
  • Tech Stack: Strong Python and ML ecosystem skills (PyTorch, TensorFlow, Scikit-learn, HuggingFace). Experience with MLOps, model serving, and cloud platforms.
  • Domain Experience (preferred): Working with observability data (logs, metrics, traces) and tools like Prometheus, Grafana, ELK, or Splunk.
  • Leadership:
  • Strong execution skills with ability to balance tech vision and delivery.
  • Comfortable working across org boundaries and aligning stakeholders.
  • Passion for coaching and growing engineering talent.

Work Arrangement

Hybrid: This role operates in a hybrid capacity, blending the benefits of remote work with the advantages of in-person collaboration. For most roles, that will mean coming into an office a minimum of 2 - 3 days per week, however certain roles and/or teams may require more frequent in-office presence. Additional team-specific guidance and norms will be provided by your manager.

Read Full Description
Confirmed 12 hours ago. Posted 3 days ago.

Discover Similar Jobs

Suggested Articles