Huawei Cloud CCE cost optimization analysis skill. Identifies idle resources, oversized CPU/memory requests, low-utilization nodes, 24h/7d utilization trends...
--- name: huawei-cloud-cce-cost-optimization-advisor description: | Huawei Cloud CCE cost optimization analysis skill. Identifies idle resources, oversized CPU/memory requests, low-utilization nodes, 24h/7d utilization trends, HPA recommendations, and node autoscaler policy optimization. Read-only analysis and configuration suggestions only — does not modify HPA, autoscaler, node pools, or workloads without explicit user confirmation. Trigger: user mentions "cost optimization", "成本优化", "cost advisor", "成本顾问", "resource waste", "资源浪费", "cost reduction", "成本降低", "billing analysis", "账单分析", "over-provisioned", "超配", "CCE cost", "idle nodes", "oversized request", "HPA recommendation", "autoscaler policy" tags: [cce, cost-optimization, resource-utilization, hpa, autoscaler] --- # Huawei Cloud CCE Cost Optimization Advisor ## Overview Analyze CCE (Cloud Container Engine) cluster cost optimization opportunities. This skill performs read-only analysis and generates configuration suggestions — it does **not** directly modify HPA, autoscaler, node pools, or workload requests. All configuration changes require explicit user confirmation. **Analysis scope**: - 24-hour and 7-day node CPU/memory utilization trends - Low-utilization node detection (below cluster average or below 30%) - Oversized resource request detection (business workloads only) - HPA and node autoscaler status review and recommendations - Cost optimization report with execution plan **Architecture**: Python SDK v3 → CCE API + AOM PromQL → Inventory + Metrics → Cost Analysis → Report ## Security Constraints ### Dangerous Operation Confirmation Mechanism > **This skill enforces a strict read-only-by-default policy. All write operations require `confirm=true`.** #### Operations Requiring Confirmation | Tool | Operation Type | Risk Level | Description | |------|---------------|------------|-------------| | `huawei_configure_cce_hpa` | Create/Update HPA | 🟠 High | Creates or replaces a HorizontalPodAutoscaler | | Node pool resize/scale-down | Scale | 🟠 High | Reduces node pool capacity | **Write operations without `confirm=true` return a preview only**. The `huawei_configure_cce_hpa` tool returns a manifest preview and risk warning when called without `confirm=true`. Only after explicit user approval can it be called with `confirm=true` to apply the configuration. #### Workflow **Step 1: Preview Operation** — Call without `confirm=true` ```bash python3 scripts/huawei-cloud.py huawei_configure_cce_hpa \ region=cn-north-4 \ cluster_id=xxx \ workload_name=my-deploy \ namespace=default \ min_replicas=1 \ max_replicas=3 \ target_cpu_utilization=60 ``` Returns: HPA manifest preview, risk warning, confirmation hint **Step 2: Confirm Execution** — Call with `confirm=true` after user approval ```bash python3 scripts/huawei-cloud.py huawei_configure_cce_hpa \ region=cn-north-4 \ cluster_id=xxx \ workload_name=my-deploy \ namespace=default \ min_replicas=1 \ max_replicas=3 \ target_cpu_utilization=60 \ confirm=true ``` ### Prohibited Actions - **No automatic node pool scale-down** — never delete nodes or shrink node pools automatically - **No workload request modification** — never change CPU/memory requests directly - **No automatic HPA installation/update** — never apply HPA without explicit user confirmation - **No autoscaler enable/disable** — never toggle autoscaler without user approval ### Allowed Actions - Read-only queries: nodes, node pools, pods, deployments, metrics, AOM PromQL - Generate HPA YAML manifests, autoscaler parameter suggestions, and execution plans - `huawei_configure_cce_hpa` without `confirm=true` returns preview only ### Credential Security 1. **No persistent credential storage** — AK/SK exists only during API calls 2. **No credential leakage** — never includes AK/SK in logs, responses, or errors 3. **Environment variable preferred** — `HW_ACCESS_KEY` / `HW_SECRET_KEY` / `HW_REGION_NAME` --- ## Prerequisites ### Python Environment - Python 3.8+ - Install SDKs: `pip install huaweicloudsdkcce huaweicloudsdkcore huaweicloudsdkces` - Optional for HPA operations: `pip install kubernetes` - Optional for dashboard charts: `pip install matplotlib numpy` ### Environment Variables (Recommended) ```bash export HW_ACCESS_KEY="your-access-key-id" export HW_SECRET_KEY="your-secret-access-key" export HW_REGION_NAME="cn-north-4" ``` ### IAM Permission Policies Ensure the IAM user has the minimum required permissions: | Permission | Description | |------------|-------------| | `cce:cluster:list` | List clusters | | `cce:cluster:get` | Get cluster details | | `cce:node:list` | List nodes | | `cce:node:get` | Get node details | | `cce:nodepool:list` | List node pools | | `cce:nodepool:get` | Get node pool details | | `aom:*:get` | Read AOM metrics and PromQL data | --- ## Core Commands ### Recommended: Combined Analysis | Tool | Function | Parameters | |------|----------|------------| | `huawei_analyze_cce_cost_optimization` | One-shot cost optimization analysis — inventory, 24h/7d node utilization, pod usage/request, HPA/autoscaler status, and report output | `region`, `cluster_id`, `exclude_namespaces`, `business_namespaces`, `short_hours`, `long_hours`, `top_n`, `output_dir` | > **Prefer `huawei_analyze_cce_cost_optimization`** for comprehensive analysis. Only use individual tools below for supplementing details, reviewing specific metrics, or manually generating HPA YAML. ### Resource Inventory | Tool | Function | Parameters | |------|----------|------------| | `huawei_list_cce_clusters` | List all CCE clusters in region | `region` | | `huawei_list_cce_nodes` | List cluster nodes | `region`, `cluster_id` | | `huawei_get_kubernetes_nodes` | Get Kubernetes node details (including allocatable resources) | `region`, `cluster_id` | | `huawei_list_cce_nodepools` | List node pools with autoscaling info | `region`, `cluster_id` | | `huawei_get_cce_pods` | Get pod list with labels, status, requests | `region`, `cluster_id` | | `huawei_get_cce_deployments` | Get deployment list | `region`, `cluster_id` | | `huawei_list_cce_hpas` | List HPA configurations (excludes kube-system by default) | `region`, `cluster_id` | ### Metrics Analysis | Tool | Function | Parameters | |------|----------|------------| | `huawei_get_cce_node_metrics_topN` | Node CPU/memory/disk utilization Top N | `region`, `cluster_id`, `top_n`, `hours` | | `huawei_get_cce_node_metrics` | Single node utilization time series | `region`, `cluster_id`, `node_ip`, `hours` | | `huawei_get_cce_pod_metrics_topN` | Pod CPU/memory utilization Top N (supports custom PromQL) | `region`, `cluster_id`, `top_n`, `hours`, `cpu_query`, `memory_query` | | `huawei_get_cce_pod_metrics` | Single pod utilization time series | `region`, `cluster_id`, `pod_name`, `namespace`, `hours` | | `huawei_get_aom_metrics` | Generic AOM PromQL query | `region`, `aom_instance_id`, `query`, `hours` | ### Elasticity Policy | Tool | Function | Risk Level | Requires Confirmation | |------|----------|------------|----------------------| | `huawei_generate_cce_hpa_manifest` | Generate `autoscaling/v2` HPA YAML (no cluster modification) | 🟢 Low | No | | `huawei_configure_cce_hpa` | Create or replace HPA in cluster | 🟠 High | **Yes** (`confirm=true`) | **HPA configuration workflow**: 1. Use `huawei_generate_cce_hpa_manifest` or `huawei_configure_cce_hpa` without `confirm=true` to generate a preview 2. Review the manifest with the user 3. Only after explicit user approval, call `huawei_configure_cce_hpa` with `confirm=true` > **HPA recommendations must be based on request sizing**. If requests are clearly oversized, first recommend calibrating requests, then configure HPA. ### Dashboard | Tool | Function | Parameters | |------|----------|------------| | `huawei_generate_monitor_dashboard` | Generate monitoring dashboard chart images | `region`, `cluster_id`, `metrics_type`, `hours` | --- ## Parameter Reference ### Common Parameters All tools accept these common parameters for authentication and region: | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `region` | string | Yes | — | Huawei Cloud region code (e.g., `cn-north-4`) | | `cluster_id` | string | Yes* | — | CCE cluster ID; not required for `huawei_list_cce_clusters` | | `ak` | string | No | env `HW_ACCESS_KEY` | Access Key ID; environment variable preferred | | `sk` | string | No | env `HW_SECRET_KEY` | Secret Access Key; environment variable preferred | | `project_id` | string | No | auto | IAM project ID; auto-resolved from region if omitted | \* `cluster_id` is not required for `huawei_list_cce_clusters` (lists all clusters in region). ### Combined Analysis Parameters (`huawei_analyze_cce_cost_optimization`) | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `region` | string | Yes | — | Huawei Cloud region code | | `cluster_id` | string | Yes | — | CCE cluster ID | | `short_hours` | int | No | `24` | Short-window metrics duration in hours | | `long_hours` | int | No | `168` (7d) | Long-window metrics duration in hours | | `top_n` | int | No | `50` | Top N pods/nodes for oversized-request and utilization ranking | | `exclude_namespaces` | string | No | `kube-system` | Comma-separated namespaces to exclude from analysis | | `business_namespaces` | string | No | — | Comma-separated namespaces to treat as business workloads; if omitted, all non-excluded namespaces are analyzed | | `output_dir` | string | No | — | Directory to write summary JSON and report markdown | | `include_raw` | bool | No | `false` | Include raw metrics data in output | ### HPA Parameters (`huawei_generate_cce_hpa_manifest` / `huawei_configure_cce_hpa`) | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `workload_name` | string | Yes | — | Target Deployment/StatefulSet name | | `namespace` | string | Yes | — | Namespace of the target workload | | `min_replicas` | int | Yes | — | Minimum replica count for HPA | | `max_replicas` | int | Yes | — | Maximum replica count for HPA | | `workload_type` | string | No | `deployment` | Workload kind: `deployment` or `statefulset` | | `hpa_name` | string | No | auto | HPA object name; defaults to `<workload_name>-hpa` | | `target_cpu_utilization` | int | No | `60` | Target average CPU utilization percentage | | `target_memory_utilization` | int | No | — | Target average memory utilization percentage; omit to skip memory metric | | `behavior` | object | No | — | HPA behavior policy (scaling rates, stabilization windows) | | `confirm` | bool | No | `false` | **`huawei_configure_cce_hpa` only**: must be `true` to apply changes | ### Metrics Parameters (`huawei_get_cce_node_metrics_topN` / `huawei_get_cce_pod_metrics_topN`) | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `top_n` | int | No | `10` | Number of top nodes/pods to return | | `hours` | int | No | `1` | Metrics query time range in hours | | `cpu_query` | string | No | auto | Custom PromQL for CPU; defaults to built-in query | | `memory_query` | string | No | auto | Custom PromQL for memory; defaults to built-in query | | `node_ip` | string | Yes* | — | Required for `huawei_get_cce_node_metrics` (single node) | | `pod_name` | string | Yes* | — | Required for `huawei_get_cce_pod_metrics` (single pod) | | `namespace` | string | Yes* | — | Required for `huawei_get_cce_pod_metrics` (single pod) | \* Only required for single-entity metrics tools. ### Dashboard Parameters (`huawei_generate_monitor_dashboard`) | Parameter | Type | Required | Default | Description | |-----------|------|----------|---------|-------------| | `hours` | int | No | `1` | Monitoring data time range in hours | | `top_n` | int | No | `10` | Top N pods for dashboard ranking | | `namespace` | string | No | — | Filter by namespace | | `label_selector` | string | No | — | Filter by label (e.g., `app=nginx`) | | `output_file` | string | No | auto | Output HTML file path | | `title` | string | No | auto | Dashboard title | --- ## Analysis Workflow See [references/workflow.md](references/workflow.md) for detailed analysis steps, thresholds, and decision logic. ### Quick Summary 1. **Scope**: Confirm region, cluster_id, namespace range, and exclusion rules (default: exclude `kube-system`) 2. **Node utilization**: Analyze 24h and 7d windows for CPU/memory usage per node and cluster average 3. **Low-utilization detection**: Flag nodes below cluster average by 20 percentage points or below 60% of cluster average; cluster average below 30% signals overall over-provisioning 4. **Oversized requests**: Compare business workload request vs actual p95 usage; mark as `high` (p95 < 33% of request), `optimize` (p95 < 50%), or `observe` (short-window only) 5. **Elasticity review**: Check node pool autoscaling and HPA status; generate recommendations 6. **Output**: Summary, utilization tables, oversized request list, HPA/autoscaler recommendations, risks, and verification steps --- ## Risk Rules See [references/risk-rules.md](references/risk-rules.md) for complete safety boundaries. **Key constraints**: - Auto-execution limited to R1 read-only queries only - No automatic scale-down, request modification, or HPA/autoscaler changes - Must reference both 24h and 7d windows before recommending scale-down - Cost optimization suggestions must include rollback strategy and verification metrics - Data gaps (missing metrics, missing requests, invisible HPA) must be flagged in the report --- ## Output Schema See [references/output-schema.md](references/output-schema.md) for the complete JSON report structure. All tools return JSON with: - `status` / `success`: operation result - `data`: analysis results, metrics, or configuration preview - `message`: human-readable description - `warning`: risk warning for write operations (preview mode only) - `files`: paths to generated summary JSON and report markdown --- ## Supported Regions | Region Code | Region Name | |-------------|-------------| | cn-north-4 | North China-Beijing 4 | | cn-north-1 | North China-Beijing 1 | | cn-east-3 | East China-Shanghai 1 | | cn-south-1 | South China-Guangzhou | | ap-southeast-1 | Asia-Pacific-Hong Kong | | ap-southeast-2 | Asia Pacific-Bangkok | | ap-southeast-3 | Asia Pacific-Singapore | --- ## Best Practices 1. **Run the combined analysis first** — use `huawei_analyze_cce_cost_optimization` for a complete picture before drilling into individual tools; avoid piecemeal queries that miss cross-resource dependencies. 2. **Always check both time windows** — rely on 7-day data for stable optimization decisions; use 24-hour data only for short-term fluctuation observation, never as the sole basis for scale-down recommendations. 3. **Exclude kube-system by default** — system workloads have fixed sizing requirements; analyzing them produces misleading oversized-request signals and wastes analysis capacity. 4. **Calibrate requests before configuring HPA** — HPA scales based on request percentages; if requests are oversized, HPA will trigger premature scaling. Fix requests first, then set HPA targets. 5. **Use environment variables for credentials** — prefer `HW_ACCESS_KEY` / `HW_SECRET_KEY` over passing AK/SK as parameters to avoid credential leakage in command history and logs. 6. **Review HPA preview before confirming** — always call `huawei_configure_cce_hpa` without `confirm=true` first; inspect the manifest YAML and risk warning with the user before applying. 7. **Include rollback strategy in every recommendation** — cost optimization changes can impact availability; every suggestion must specify how to revert and how to verify the change was safe. 8. **Flag data gaps explicitly** — if metrics are missing, requests are absent, or HPA status is invisible, report these as data gaps; do not infer optimization decisions from incomplete data. 9. **Set `top_n` appropriately** — use `top_n=50` for large clusters (100+ pods) to capture all significant outliers; reduce to `top_n=10` for focused analysis of specific namespaces. 10. **Save outputs to a persistent directory** — use `output_dir` to write the summary JSON and report markdown to a known location; this enables later review and comparison across multiple analysis runs. --- ## Common Pitfalls | Pitfall | Symptom | Quick Fix | |---------|---------|-----------| | Missing AK/SK credentials | All tools return `"success": false` with credential error | Set `HW_ACCESS_KEY` and `HW_SECRET_KEY` environment variables before running | | Wrong cluster ID | Empty or error results from cluster-specific tools | Run `huawei_list_cce_clusters` first to confirm the correct `cluster_id` for your region | | Analyzing kube-system workloads | False oversized-request alerts on system DaemonSets | Set `exclude_namespaces=kube-system` (default) or add other system namespaces | | Single-window scale-down decision | Node marked low-utilization in 24h only but stable in 7d | Always require both `short_hours=24` and `long_hours=168` before recommending scale-down | | HPA on oversized requests | HPA triggers scaling at low actual usage because requests are inflated | First reduce CPU/memory requests to realistic values, then configure HPA with `target_cpu_utilization=60` | | Missing AOM metrics | Empty utilization data, `data_gaps` flagged in report | Verify IAM has `aom:*:get` permission and AOM is enabled on the cluster | | Applying HPA without preview | `huawei_configure_cce_hpa` called with `confirm=true` without review | Always call without `confirm=true` first, review manifest, then re-run with `confirm=true` | | kubernetes SDK not installed | HPA tools fail with `"Kubernetes SDK not installed"` | Install with `pip install kubernetes` before using HPA listing or configuration tools | | Large cluster with small `top_n` | Oversized-request pods missing from report | Increase `top_n` to 50 or higher for clusters with 100+ business pods | | No output directory specified | Report files written to temporary location, may be lost | Set `output_dir` to a persistent path like `./cost-reports` | --- ## Output Format All tools return JSON with status, success, data, message, warning, and iles fields. See [references/output-schema.md](references/output-schema.md) for the complete report structure. ## Verification See [Verification Method](references/verification-method.md) for step-by-step verification. ## Cross-Skill References | Skill | When to Use | |-------|-------------| | `huawei-cloud-cce-cluster-management` | Create/delete/hibernate clusters, manage node pools, manage addons, cordon/uncordon/drain nodes, create/delete individual nodes | --- ## Reference Documents | Document | Path | Description | |----------|------|-------------| | Workflow | [references/workflow.md](references/workflow.md) | Detailed analysis workflow, thresholds, and decision logic | | Risk Rules | [references/risk-rules.md](references/risk-rules.md) | Safety boundaries, prohibited actions, and confirmation requirements | | Output Schema | [references/output-schema.md](references/output-schema.md) | Cost optimization report JSON structure | --- ## Notes - Ensure AK/SK has correct IAM permissions (CCE read + AOM read) - Default analysis excludes `kube-system` namespace - HPA recommendations require request sizing to be reasonable first - Node scale-down suggestions require both 24h and 7d data confirmation - Cost optimization reports must include rollback strategy - Data gaps must be explicitly flagged
don't have the plugin yet? install it then click "run inline in claude" again.