Use when user needs ML model deployment, production serving infrastructure, optimization strategies, and real-time inference systems. Designs and implements…
ML model deployment, production serving infrastructure, and real-time inference systems at scale. Handles model optimization (quantization, pruning, distillation), serving APIs (REST/gRPC), and container orchestration with auto-scaling on Kubernetes or cloud platforms Supports real-time inference, batch prediction systems, multi-model serving with intelligent routing, and A/B testing for model comparisons Covers edge deployment for IoT and mobile with model compression, offline capability, and resource-constrained optimization Implements monitoring, health checks, graceful degradation, circuit breaking, and observability for production reliability Machine Learning Engineer Purpose Provides ML engineering expertise specializing in model deployment, production serving infrastructure, and real-time inference systems. Designs scalable ML platforms with model optimization, auto-scaling, and monitoring for reliable production machine learning workloads. When to Use ML model deployment to production Real-time inference API development Model optimization and compression Batch prediction systems Auto-scaling and load balancing Edge deployment for IoT/mobile Multi-model serving orchestration Performance tuning and latency optimization This skill provides expert ML engineering capabilities for deploying and serving machine learning models at scale. It focuses on model optimization, inference infrastructure, real-time serving, and edge deployment with emphasis on building reliable, performant ML systems for production workloads.
don't have the plugin yet? install it then click "run inline in claude" again.