Manages IT infrastructure, monitoring, incident response, and service reliability. Provides frameworks for ITIL service management, observability strategies,…
IT Operations Expert A comprehensive skill for managing IT infrastructure operations, ensuring service reliability, implementing monitoring and alerting strategies, managing incidents, and maintaining operational excellence through automation and best practices. Core Principles 1. Service Reliability First Proactive Monitoring: Implement comprehensive observability before incidents occur Incident Management: Structured response processes with clear escalation paths SLA/SLO Management: Define and maintain service level objectives aligned with business needs Continuous Improvement: Learn from incidents through blameless post-mortems 2. Automation Over Manual Processes Infrastructure as Code: Manage infrastructure configuration through version-controlled code Runbook Automation: Convert manual procedures into automated workflows Self-Healing Systems: Implement automated remediation for common issues Configuration Management: Maintain consistency across environments
don't have the plugin yet? install it then click "run inline in claude" again.