server-log-analysis-en

Item: server-log-analysis-en
Rating: 7.8
Author: Implexa

Connect to remote servers over SSH, read sibling config.yaml to understand service metadata and log locations, download only required log snippets to local t...

view source

installs

stars

karma

SkillRank score ↗

7.8/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

server-log-analysis-en diagnoses remote service issues by connecting via ssh, reading service metadata from config.yaml, downloading minimal log snippets to local temp, and analyzing for root cause with evidence and confidence levels.

structure

9.0

trigger phrases

8.0

procedure

8.0

edge cases

7.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: server-log-analysis
description: Connect to remote servers over SSH, read sibling config.yaml to understand service metadata and log locations, download only required log snippets to local temp for analysis, and diagnose issues from evidence. Use when users ask to troubleshoot remote service logs, investigate backend exceptions, or perform SSH-based log diagnostics.
---

# Server Log Analysis

## Purpose

Use this Skill to investigate service issues when logs are stored on remote servers.

This Skill assumes:

- The agent can connect to servers via SSH or equivalent remote execution tooling.
- `config.yaml` in this Skill directory defines service metadata, log paths, and business context.
- Before deep analysis, relevant log snippets should be copied to local `temp/` first.

## Required Reading

- Read `config.yaml` first.
- Read `reference.md` when field details or command patterns are needed.

## Core Workflow

1. Read `config.yaml`.
2. Map the user issue to one or more configured services.
3. Define the smallest necessary investigation scope:
- target service
- target host
- relevant time window
- candidate log files
4. Connect to the target server via SSH or available remote tools.
5. Perform remote checks before downloading:
- file existence and file size
- last modified time
- whether keyword filtering or tail output is sufficient
6. Download only minimal required log snippets to configured local `temp/`.
7. Analyze local copies for errors, timing correlation, repeated failures, and likely root cause.
8. Output concise diagnosis with conclusions, evidence, uncertainty, and follow-up actions.

## Investigation Rules

- Prioritize service definitions and business context in `config.yaml`; do not guess.
- Prefer remote filtering before full download:
- narrow time window first
- then filter by keywords
- use tail first for recent incidents
- Download full logs only when snippets are insufficient.
- Local filenames should clearly include service, host, and time range.
- Unless explicitly requested, do not fetch sensitive files, binaries, or unrelated large archives.
- For cross-service issues, analyze primary service first, then expand to dependencies.

## Service Selection

When user intent is ambiguous:

1. Use service `aliases`, `keywords`, and `description` in `config.yaml`.
2. Pick the service with the highest semantic match.
3. If still unclear, ask the user which service to inspect before remote connection.

## Remote Pre-Check Checklist

Before downloading logs, confirm:

- host configuration matches target service
- configured log files exist
- which log file was updated most recently
- whether rolling logs must be included
- whether issue is recent or historical

Common remote checks include:

- file metadata checks
- recent log tail checks
- quick keyword search
- time-window extraction
- process/service status when needed

## Local Download Rules

Store downloaded logs under configured `local_temp_dir`.

Recommended filename format:

`<service>__<host>__<log_name>__<time_hint>.log`

Priority order:

1. recent tail logs
2. keyword-filtered snippets
3. explicit time-window snippets
4. full file as last resort

## Analysis Focus

Focus on:

- startup failures
- repeated exceptions
- timeout and connection issues
- resource pressure signals
- failures in DB/cache/message queue/DNS/HTTP upstream dependencies
- config errors exposed by stack traces or startup logs
- timestamp alignment across related services

The response should include:

- issue summary
- key evidence
- preliminary cause
- confidence level
- next verification steps

## Security Constraints

- Treat `config.yaml` as operations metadata; do not store plaintext secrets.
- Prefer environment variables, key files, or external secret managers for SSH credentials.
- Unless explicitly requested, do not modify remote files or restart services.
- Unless requested, do not auto-delete downloaded logs.

## Exception Handling

If remote access fails:

1. Clearly state which step failed.
2. State target host and service.
3. Ask user for correct SSH access method, network path, or credentials.

If configured log path does not exist:

1. Clearly identify missing path.
2. Check whether alternate paths are configured for the same service.
3. Ask user whether deployment paths changed.

## Quick Execution Order

Always follow this order:

1. Read `config.yaml`.
2. Identify service and host.
3. Perform remote log pre-checks.
4. Copy minimal required logs to `temp/`.
5. Analyze locally.
6. Summarize conclusions with evidence.

related skills

semantically similar in the cross-vendor index

clawhub

67% match

Alibabacloud Pai Eas Service Diagnose

PAI-EAS service diagnosis and troubleshooting. Diagnose startup failures, error logs, slow responses, instance restarts, OOMKilled, ImagePullBackOff, CrashLo...

don't have the plugin yet? install it then click "run inline in claude" again.

Server Log Analysis

intent

diagnose service issues by connecting to remote servers via ssh, reading local config.yaml to map services to hosts and log paths, downloading only required log snippets to a local temp directory, then analyzing them for errors, timing misalignment, and root cause. use this skill when users ask to troubleshoot backend exceptions, investigate remote service failures, or perform ssh-based log diagnostics without direct server access.

inputs

external connections:

ssh access to target servers (configure host, port, username in config.yaml; provide ssh key via environment variable SSH_KEY_PATH or SSH_AGENT)
network connectivity to target hosts (verify firewall rules allow outbound ssh on port 22 or configured port)

local files:

config.yaml in the skill directory: defines service names, aliases, keywords, host details (hostname, port, ssh user), log file paths, time zones, and business context
reference.md (optional): field definitions, command patterns, and log format details for specific services

context:

user's description of the issue (service name, symptom, approximate time window)
any known deployment or configuration changes (new version, infrastructure migration, etc.)

environment variables:

SSH_KEY_PATH: full path to ssh private key file (e.g., ~/.ssh/id_rsa)
SSH_TIMEOUT: connection timeout in seconds (default 30)
LOCAL_TEMP_DIR: directory for downloaded logs (default ./temp/)

procedure

read config.yaml and parse service definitions. extract all services, their aliases, keywords, descriptions, host details (hostname, port, ssh user), log file paths, time windows, and business context. note any time zone offsets or log rotation patterns.

input: config.yaml file path output: parsed service registry (dict with service names as keys, each containing host, log paths, aliases, keywords, description, timezone)
map user intent to a configured service. compare user's issue description (service name, symptom keywords) against config.yaml service aliases, keywords, and descriptions. pick the service with highest semantic match.

input: user issue description, parsed service registry output: target service name (string); if ambiguous, ask user to clarify before proceeding
extract target host, log files, and time window. from the selected service in config.yaml, get hostname, ssh port, ssh username, candidate log file paths, and relevant time window (e.g., last 2 hours, last 24 hours, or explicit start/end times).

input: target service name, parsed service registry output: target host (string), ssh port (int), ssh user (string), log file paths (list), time window (tuple of datetime or relative offset)
establish ssh connection to target host. connect using ssh key from SSH_KEY_PATH environment variable or ssh agent. apply SSH_TIMEOUT. catch connection errors and surface them clearly.

input: target host, ssh port, ssh user, ssh key path output: open ssh session (connection object); if connection fails, return error with host and user attempted
perform remote pre-check on log files. for each candidate log file, run remote commands to check: file existence (ls), file size (du or stat), last modified time (stat or ls -l), whether log rotation is active (ls -la for numbered or dated variants).

input: ssh session, log file paths list output: file metadata dict (path, exists: bool, size_bytes: int, mtime: timestamp, rotation_variants: list); if no files exist, surface clearly which paths were missing
decide which log files to fetch based on pre-check results. prioritize by recency (newest mtime first) and size (smallest first to minimize download). if recent tail is sufficient (last N lines contain the error), mark for tail extraction. if time-window filtering is needed, mark for time-based extraction. only mark full file download if other methods are insufficient.

input: file metadata dict, time window, user issue description output: download strategy (list of (log_path, extraction_method, extraction_params)); extraction_method is one of: tail (N lines), time_window (start_time, end_time), keyword_filter (keyword list, optional time_window), full_file
execute remote extraction and download logs to local temp. for each file in download strategy: run remote command (tail, grep with time filter, sed, awk, etc.) to extract only required snippet. pipe output to local temp file with clear naming: <service>__<host>__<log_name>__<time_hint>.log. handle remote command failures and partial downloads gracefully.

input: ssh session, download strategy, LOCAL_TEMP_DIR output: list of local file paths (strings); log each download with bytes transferred; if download fails, return error with remote path and command attempted
close ssh connection. cleanly close the ssh session.

input: ssh session output: connection closed (success or log warning if close fails)
analyze downloaded log files locally. scan each local log file for: exception stack traces, repeated error messages, timeout or connection errors, resource pressure signals (out of memory, disk full, high cpu), failures in external dependencies (db, cache, dns, http upstream), config errors, timestamp gaps, and timing correlation across related services.

input: local log file paths list output: findings dict (exceptions: list, errors: list, warnings: list, resource_signals: list, dependency_failures: list, timestamps: list, correlations: list)
synthesize diagnosis and output concise summary. combine findings into: issue summary (1-2 sentences), key evidence (top 3-5 log snippets supporting the diagnosis), preliminary root cause (with confidence level: high/medium/low), and next verification steps (tests or commands to confirm, or escalation path).

input: findings dict output: diagnosis summary (markdown or plain text)

decision points

if user intent is ambiguous (multiple services match keywords):

ask user which service to inspect by name or description from config.yaml
do not assume or guess
do not proceed to remote connection until service is confirmed

if config.yaml is missing or unreadable:

return error immediately
do not attempt ssh connection
ask user to verify config.yaml exists and is valid yaml

if ssh connection fails:

state clearly: target host, ssh port, ssh user attempted
offer diagnostic steps: check hostname dns resolution, verify ssh key exists and permissions, check firewall rules, verify target host is online
ask user for correct ssh access method or alternate host

if remote pre-check finds no matching log files:

list all paths checked
ask user whether deployment paths changed, log rotation strategy differs, or service is deployed elsewhere
offer to check alternate paths if configured aliases exist

if remote pre-check finds very large log files (> 1 GB):

do not download full file automatically
require explicit user approval or keyword filter before download
prefer tail or time-window extraction instead

if extraction returns empty result (no logs in time window, no keyword matches):

state clearly: extraction method used, time window or keyword applied, log file path
ask user whether time window is correct, whether error has occurred yet, or whether service is logging at all
offer to expand time window or try different keyword

if analysis finds multiple potential root causes (low confidence):

list each hypothesis with supporting evidence
ask user for additional context (recent deployments, upstream changes, etc.)
recommend next verification steps to narrow down cause

if downloaded logs contain sensitive data (passwords, tokens, api keys visible):

do not include sensitive data in final output
redact or summarize instead
alert user that raw logs contain sensitive data and should be handled carefully

output contract

success output format:

diagnosis summary (markdown or plain text) containing: issue summary (1-2 sentences), key evidence (top log excerpts with line numbers or timestamps), preliminary root cause, confidence level (high/medium/low), next verification steps
all downloaded log snippets stored in LOCAL_TEMP_DIR with naming format <service>__<host>__<log_name>__<time_hint>.log
no plaintext secrets in any output file
ssh connection cleanly closed

error output format:

error message stating: which step failed, target service/host/log path involved, remediation step (e.g., "provide ssh key", "confirm log path changed")
no partial or corrupted log files left in temp directory without user awareness

outcome signal

user knows the skill worked when:

diagnosis summary is provided with clear evidence (log excerpts) and confidence level stated
user can verify the preliminary root cause by checking log snippets in the output
next verification steps are actionable (specific commands, file paths, or tests)
downloaded log files are available in temp directory for user to inspect independently
ssh connection was cleanly closed and no hanging processes remain

if skill fails:

clear error message states which step failed and why
user can take corrective action (provide missing config, verify ssh access, confirm log paths, etc.)
no partial data or corrupted files left without explicit user awareness