Connect to remote servers over SSH, read sibling config.yaml to understand service metadata and log locations, download only required log snippets to local t...
--- name: server-log-analysis description: Connect to remote servers over SSH, read sibling config.yaml to understand service metadata and log locations, download only required log snippets to local temp for analysis, and diagnose issues from evidence. Use when users ask to troubleshoot remote service logs, investigate backend exceptions, or perform SSH-based log diagnostics. --- # Server Log Analysis ## Purpose Use this Skill to investigate service issues when logs are stored on remote servers. This Skill assumes: - The agent can connect to servers via SSH or equivalent remote execution tooling. - `config.yaml` in this Skill directory defines service metadata, log paths, and business context. - Before deep analysis, relevant log snippets should be copied to local `temp/` first. ## Required Reading - Read `config.yaml` first. - Read `reference.md` when field details or command patterns are needed. ## Core Workflow 1. Read `config.yaml`. 2. Map the user issue to one or more configured services. 3. Define the smallest necessary investigation scope: - target service - target host - relevant time window - candidate log files 4. Connect to the target server via SSH or available remote tools. 5. Perform remote checks before downloading: - file existence and file size - last modified time - whether keyword filtering or tail output is sufficient 6. Download only minimal required log snippets to configured local `temp/`. 7. Analyze local copies for errors, timing correlation, repeated failures, and likely root cause. 8. Output concise diagnosis with conclusions, evidence, uncertainty, and follow-up actions. ## Investigation Rules - Prioritize service definitions and business context in `config.yaml`; do not guess. - Prefer remote filtering before full download: - narrow time window first - then filter by keywords - use tail first for recent incidents - Download full logs only when snippets are insufficient. - Local filenames should clearly include service, host, and time range. - Unless explicitly requested, do not fetch sensitive files, binaries, or unrelated large archives. - For cross-service issues, analyze primary service first, then expand to dependencies. ## Service Selection When user intent is ambiguous: 1. Use service `aliases`, `keywords`, and `description` in `config.yaml`. 2. Pick the service with the highest semantic match. 3. If still unclear, ask the user which service to inspect before remote connection. ## Remote Pre-Check Checklist Before downloading logs, confirm: - host configuration matches target service - configured log files exist - which log file was updated most recently - whether rolling logs must be included - whether issue is recent or historical Common remote checks include: - file metadata checks - recent log tail checks - quick keyword search - time-window extraction - process/service status when needed ## Local Download Rules Store downloaded logs under configured `local_temp_dir`. Recommended filename format: `<service>__<host>__<log_name>__<time_hint>.log` Priority order: 1. recent tail logs 2. keyword-filtered snippets 3. explicit time-window snippets 4. full file as last resort ## Analysis Focus Focus on: - startup failures - repeated exceptions - timeout and connection issues - resource pressure signals - failures in DB/cache/message queue/DNS/HTTP upstream dependencies - config errors exposed by stack traces or startup logs - timestamp alignment across related services The response should include: - issue summary - key evidence - preliminary cause - confidence level - next verification steps ## Security Constraints - Treat `config.yaml` as operations metadata; do not store plaintext secrets. - Prefer environment variables, key files, or external secret managers for SSH credentials. - Unless explicitly requested, do not modify remote files or restart services. - Unless requested, do not auto-delete downloaded logs. ## Exception Handling If remote access fails: 1. Clearly state which step failed. 2. State target host and service. 3. Ask user for correct SSH access method, network path, or credentials. If configured log path does not exist: 1. Clearly identify missing path. 2. Check whether alternate paths are configured for the same service. 3. Ask user whether deployment paths changed. ## Quick Execution Order Always follow this order: 1. Read `config.yaml`. 2. Identify service and host. 3. Perform remote log pre-checks. 4. Copy minimal required logs to `temp/`. 5. Analyze locally. 6. Summarize conclusions with evidence.
don't have the plugin yet? install it then click "run inline in claude" again.
diagnose service issues by connecting to remote servers via ssh, reading local config.yaml to map services to hosts and log paths, downloading only required log snippets to a local temp directory, then analyzing them for errors, timing misalignment, and root cause. use this skill when users ask to troubleshoot backend exceptions, investigate remote service failures, or perform ssh-based log diagnostics without direct server access.
external connections:
local files:
context:
environment variables:
read config.yaml and parse service definitions. extract all services, their aliases, keywords, descriptions, host details (hostname, port, ssh user), log file paths, time windows, and business context. note any time zone offsets or log rotation patterns.
input: config.yaml file path output: parsed service registry (dict with service names as keys, each containing host, log paths, aliases, keywords, description, timezone)
map user intent to a configured service. compare user's issue description (service name, symptom keywords) against config.yaml service aliases, keywords, and descriptions. pick the service with highest semantic match.
input: user issue description, parsed service registry output: target service name (string); if ambiguous, ask user to clarify before proceeding
extract target host, log files, and time window. from the selected service in config.yaml, get hostname, ssh port, ssh username, candidate log file paths, and relevant time window (e.g., last 2 hours, last 24 hours, or explicit start/end times).
input: target service name, parsed service registry output: target host (string), ssh port (int), ssh user (string), log file paths (list), time window (tuple of datetime or relative offset)
establish ssh connection to target host. connect using ssh key from SSH_KEY_PATH environment variable or ssh agent. apply SSH_TIMEOUT. catch connection errors and surface them clearly.
input: target host, ssh port, ssh user, ssh key path output: open ssh session (connection object); if connection fails, return error with host and user attempted
perform remote pre-check on log files. for each candidate log file, run remote commands to check: file existence (ls), file size (du or stat), last modified time (stat or ls -l), whether log rotation is active (ls -la for numbered or dated variants).
input: ssh session, log file paths list output: file metadata dict (path, exists: bool, size_bytes: int, mtime: timestamp, rotation_variants: list); if no files exist, surface clearly which paths were missing
decide which log files to fetch based on pre-check results. prioritize by recency (newest mtime first) and size (smallest first to minimize download). if recent tail is sufficient (last N lines contain the error), mark for tail extraction. if time-window filtering is needed, mark for time-based extraction. only mark full file download if other methods are insufficient.
input: file metadata dict, time window, user issue description output: download strategy (list of (log_path, extraction_method, extraction_params)); extraction_method is one of: tail (N lines), time_window (start_time, end_time), keyword_filter (keyword list, optional time_window), full_file
execute remote extraction and download logs to local temp. for each file in download strategy: run remote command (tail, grep with time filter, sed, awk, etc.) to extract only required snippet. pipe output to local temp file with clear naming: <service>__<host>__<log_name>__<time_hint>.log. handle remote command failures and partial downloads gracefully.
input: ssh session, download strategy, LOCAL_TEMP_DIR output: list of local file paths (strings); log each download with bytes transferred; if download fails, return error with remote path and command attempted
close ssh connection. cleanly close the ssh session.
input: ssh session output: connection closed (success or log warning if close fails)
analyze downloaded log files locally. scan each local log file for: exception stack traces, repeated error messages, timeout or connection errors, resource pressure signals (out of memory, disk full, high cpu), failures in external dependencies (db, cache, dns, http upstream), config errors, timestamp gaps, and timing correlation across related services.
input: local log file paths list output: findings dict (exceptions: list, errors: list, warnings: list, resource_signals: list, dependency_failures: list, timestamps: list, correlations: list)
synthesize diagnosis and output concise summary. combine findings into: issue summary (1-2 sentences), key evidence (top 3-5 log snippets supporting the diagnosis), preliminary root cause (with confidence level: high/medium/low), and next verification steps (tests or commands to confirm, or escalation path).
input: findings dict output: diagnosis summary (markdown or plain text)
if user intent is ambiguous (multiple services match keywords):
if config.yaml is missing or unreadable:
if ssh connection fails:
if remote pre-check finds no matching log files:
if remote pre-check finds very large log files (> 1 GB):
if extraction returns empty result (no logs in time window, no keyword matches):
if analysis finds multiple potential root causes (low confidence):
if downloaded logs contain sensitive data (passwords, tokens, api keys visible):
success output format:
<service>__<host>__<log_name>__<time_hint>.logerror output format:
user knows the skill worked when:
if skill fails: