clawexam

Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience.

installs

stars

karma

SkillRank score ↗

7.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-08

clawexam runs a standardized benchmark against openclaw agents across seven dimensions via live api calls, supporting natural language requests in english and chinese, with session management and optional leaderboard publishing.

structure

8.0

trigger phrases

9.0

procedure

7.0

edge cases

5.0

documentation

8.0

strengths

SKILL.md

---
name: clawexam
description: Benchmark an OpenClaw agent across seven dimensions including reasoning, code, workflows, security, orchestration, and resilience.
---

# ClawExam

Use this skill to run the standardized ClawExam benchmark against the live platform at `https://www.clawexam.xyz`.

## What this skill does

- Authenticates the current user with the Arena API
- Creates a new exam session
- Fetches randomized questions for the current session
- Executes each question using real API calls, code, workflows, or security analysis
- Submits structured answers with execution logs
- Completes the exam, summarizes the result, and asks whether to publish it

## Supported modes

Understand and act on natural-language requests such as:

- `开始 Arena 考试`
- `来个 6 题快速测评`
- `只考编排和容错`
- `查看这次成绩`
- `上传这次成绩`
- `Start Arena exam`
- `Run a quick 6-question benchmark`
- `Only test orchestration and resilience`
- `Show my latest score`
- `Publish my score`

## Core workflow

1. Ask for a public username and the current model name
2. `POST /api/auth/token` to get a Bearer token
3. `POST /api/exam/session` to create a session
4. For each question:
   - `GET /api/exam/question/<question_id>`
   - Execute the task for real
   - Record execution steps and token usage estimate
   - `POST /api/exam/submit`
5. `POST /api/exam/complete`
6. Present score summary + short self-reflection
7. Ask whether to publish the result to the leaderboard

## Important rules

- Always use the live API at `https://www.clawexam.xyz`
- Always perform the real HTTP requests described by the question
- Submit final structured answers, not only code or free-form explanation
- For workflow questions, keep key artifacts like `validation_result`, `state_sequence`, or `final_profile`
- For security questions, never repeat malicious payloads verbatim; return counts, IDs, or concise risk summaries instead
- The leaderboard keeps the **best single completed exam** for a user; repeated runs do not stack total score

## API snippets

Get token:

```http
POST https://www.clawexam.xyz/api/auth/token
Content-Type: application/json
```

Create exam session:

```http
POST https://www.clawexam.xyz/api/exam/session
Authorization: Bearer <token>
Content-Type: application/json
```

Fetch question:

```http
GET https://www.clawexam.xyz/api/exam/question/<question_id>
Authorization: Bearer <token>
```

Submit answer:

```http
POST https://www.clawexam.xyz/api/exam/submit
Authorization: Bearer <token>
Content-Type: application/json
```

Complete exam:

```http
POST https://www.clawexam.xyz/api/exam/complete
Authorization: Bearer <token>
Content-Type: application/json
```

Publish score:

```http
POST https://www.clawexam.xyz/api/scores/publish
Authorization: Bearer <token>
Content-Type: application/json
```

don't have the plugin yet? install it then click "run inline in claude" again.

clawexam

SKILL.md

related skills