Open Alex

Use OpenAlex to find and cite scholarly works, authors, institutions, and trends via metadata queries without needing an API key.

installs

stars

karma

SkillRank score ↗

8.1/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-01

open-alex-al teaches agents to query scholarly metadata (works, authors, institutions, citations) via openalex without api keys, resolve entities to ids, construct filters, read paginated results, and cite with doi plus openalex url.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

7.0

documentation

8.0

strengths

view original SKILL.md from clawhubclick to expand

# OpenAlex Skill

> **FEATURED** — Teaches an agent how to use OpenAlex (the open scholarly graph) correctly: discover entities, query works with filters, read results, and cite accurately. **No API key required.** Set `OPENALEX_MAILTO` for the polite pool.

This skill pairs with the **OpenAlex MCP server** (see `../mcp/`), which provides the 6 callable tools. The skill provides the *know-how*. Use imperative voice; do what each step says.

---

## 1. Name

`openalex` — Open scholarly metadata: works, authors, institutions, sources, topics, concepts, publishers, funders.

## 2. Purpose

Answer research questions using authoritative bibliographic data: find papers, authors, citations, open-access status, and bibliometric trends — and cite them precisely. OpenAlex is **free** and **open**.

## 3. When to use OpenAlex

Use OpenAlex when the task involves:

- Scholarly **works** (papers, preprints, datasets, books) and their metadata.
- **Authors** and their output, affiliations, and citation counts.
- **Citations** and impact (cited_by_count, FWCI).
- **Open-access** status and finding free full-text links.
- **Bibliometrics / trends**: counts by year, institution, topic, OA status.
- Institutions, journals/sources, topics, concepts, publishers, funders.

It is free — prefer it for any academic-metadata need.

## 4. When NOT to use OpenAlex

- **Full-text PDFs / reading the paper body** → OpenAlex gives metadata + `open_access.oa_url`; follow that URL to the file. OpenAlex does not serve full text.
- **General/non-academic web information** → use a web search API, not OpenAlex.
- **Paywalled full text** → OpenAlex can tell you if/where an OA copy exists, but cannot bypass paywalls.

## 5. Environment

- **No API key. No required environment variables.**
- **Recommended:** set `OPENALEX_MAILTO=you@example.com` to join the **polite pool** (faster, fewer `429`s). Not a secret.
- Optional: `OPENALEX_API_BASE_URL`, `OPENALEX_TIMEOUT_MS` (30000), `OPENALEX_MAX_RETRIES` (3), `LOG_LEVEL`.

## 6. Operations (the 6 tools + generic)

| Tool | Use it to |
|------|-----------|
| `openalex_search` | Resolve a name/title/keyword to entities (and IDs). |
| `openalex_works` | Query works with `filter`, `sort`, paging — the main tool. |
| `openalex_get` | Fetch one entity by OpenAlex ID / DOI / ORCID / ROR. |
| `openalex_authors` | Search/filter authors. |
| `openalex_group_by` | Counts grouped by a field (analytics). |
| `openalex_request` | Generic passthrough to **any** endpoint (sources, topics, autocomplete, …). |

## 7. Discovery workflow

1. Start from human input (a name, title, keyword).
2. Resolve to an **entity ID** with `openalex_search` or `openalex_request` → `autocomplete/{entity}`.
3. Verify you picked the right entity (check `display_name`, affiliation, works_count).
4. Note the ID prefix → entity type:

| Prefix | Entity | Prefix | Entity |
|--------|--------|--------|--------|
| `W` | Works | `T` | Topics |
| `A` | Authors | `C` | Concepts |
| `I` | Institutions | `P` | Publishers |
| `S` | Sources | `F` | Funders |

Entity types: `works`, `authors`, `sources`, `institutions`, `topics`, `concepts`, `publishers`, `funders`, `keywords`.

## 8. Query workflow

Build a `filter` (comma-separated, ANDed) and pick a `sort`:

| Need | Filter |
|------|--------|
| Year | `publication_year:2024` |
| Date range | `from_publication_date:…,to_publication_date:…` |
| Open access | `is_oa:true` |
| By author | `authorships.author.id:A…` |
| By institution | `authorships.institutions.id:I…` |
| By topic | `primary_topic.id:T…` |
| Highly cited | `cited_by_count:>100` |
| Type | `type:article` |

- Sort by impact: `cited_by_count:desc`. Sort by recency: `publication_date:desc`.
- `per-page` ≤ **200**.
- For deep traversal, use **cursor** (`cursor=*` then `meta.next_cursor`), not high `page` numbers.

## 9. Reading results

- `meta.count` = total matches (not the number returned).
- `results` = the current page only.
- `group_by` = `[{key, key_display_name, count}]` for aggregations.
- **Abstract:** works carry `abstract_inverted_index` (a `{word: [positions]}` map), not plain text. Reconstruct by placing each word at its positions and joining in order.
- **Full text:** follow `open_access.oa_url` for the free PDF/HTML.

## 10. Citation rules

Cite every claim with: **title, authors, year, DOI, and the OpenAlex ID + URL** `https://openalex.org/<ID>`.

```
<Authors> (<year>). <Title>. <Source>. DOI: <doi>. OpenAlex: https://openalex.org/<WID>
```

The OpenAlex URL is mandatory for traceability, in addition to the DOI.

## 11. Freshness

OpenAlex data updates **frequently** (new works, citation counts, affiliations). Counts you report are point-in-time. When precision matters, note the access date and that figures may change.

## 12. Integrity

- Report **only** what the API returns. **Never invent** papers, authors, DOIs, or citation counts.
- If results are empty, say so and broaden — do not fabricate to satisfy a requested count.
- Keep totals (`meta.count`) distinct from listed `results`.

## 13. Error handling

| Error | Cause | Reaction |
|-------|-------|----------|
| HTML 404 | Bad/typo ID | Fix the ID prefix/value; re-resolve via search/autocomplete. |
| `429` | Not in polite pool / too fast | Set `OPENALEX_MAILTO`; back off; reduce volume. |
| Empty results | Filter too narrow | Broaden filter; check key spelling; try `search`. |
| `400` | Bad filter syntax | Comma-separate; use `key:value`; verify keys. |
| Timeout | Query too broad | Add a filter; lower `per-page`. |

## 14. Cost / etiquette

- **Free.** Be polite: set `OPENALEX_MAILTO`.
- **Cache** resolved IDs and stable records.
- **Avoid huge unfiltered scans.** Always filter first.
- **Use cursor**, not high page numbers (`page` is capped ~10000 results).

## 15. Security

- No secrets to manage. `OPENALEX_MAILTO` is not sensitive but keep configs clean.
- Read-only API; outbound HTTPS only. Keep logs on stderr; protocol on stdout.

## 16. Agent checklist

- [ ] Resolved names to IDs (and verified the right entity)?
- [ ] Built a `filter` instead of scanning everything?
- [ ] Chose an appropriate `sort`?
- [ ] Used cursor for deep paging?
- [ ] Read `meta.count` vs `results` correctly?
- [ ] Reconstructed abstracts from the inverted index if needed?
- [ ] Cited title + authors + year + DOI + OpenAlex ID/URL?
- [ ] Set `OPENALEX_MAILTO` to avoid `429`?
- [ ] Reported only real, returned data?

## 17. Example workflows

- **Literature review:** resolve topic → `openalex_works` (topic + year + `is_oa`, sort by citations) → `openalex_get` top work → author/institution profiles → cited summary. See `recipes/literature-search.md`.
- **Author profile:** resolve author → `openalex_get` author → `openalex_works` filtered by `authorships.author.id` → top works + metrics. See `recipes/author-profile.md`.
- **Trend by year:** `openalex_group_by` on `publication_year` with a topic/OA filter. See `recipes/citation-trends.md`.

## 18. Common mistakes

- Using `per_page` on the wire instead of **`per-page`** (hyphen) in `openalex_request`.
- Deep-paging with high `page` numbers (capped ~**10000** results) instead of **cursor**.
- Treating `abstract_inverted_index` as plain text.
- Reporting `meta.count` as the number of items returned.
- Forgetting the OpenAlex ID/URL in citations.
- Skipping `OPENALEX_MAILTO` and hitting `429`.

## 19. Maintenance

- Re-resolve IDs periodically; entities can merge/change.
- Re-check filter keys and limits against <https://docs.openalex.org> when behavior changes.
- Update cached records given frequent data refreshes.

> Verification needed: confirm filter keys, limits, and field names with <https://docs.openalex.org>.

don't have the plugin yet? install it then click "run inline in claude" again.

restructured original 19 sections into implexa's 6-component format, added explicit env var guidance (openalex_mailto for polite pool, timeout, retries), documented mcp server tool signatures, expanded decision logic for empty results, rate limits, pagination, abstract reconstruction, and paywall detection, standardized citation format with mandatory openalex url, clarified error handling with http codes, and added rate limit mitigation strategies.

OpenAlex Skill

Item: Open Alex
Rating: 8.1
Author: Implexa

intent

OpenAlex is a free, open scholarly graph. use it to find papers, authors, institutions, citations, and bibliometric trends without an API key. resolves names to entity IDs, queries works with filters and sorts, reconstructs abstracts, and generates citations with DOI + OpenAlex URL. use this skill any time you need academic metadata: publication history, citation counts, open-access status, institutional affiliation, or research trends by year, topic, or concept.

inputs

external connection: OpenAlex API

Base URL: https://api.openalex.org (overridable via OPENALEX_API_BASE_URL)
No authentication required; read-only HTTPS
Rate limit: 10 requests per second (standard pool); 100 requests per second (polite pool)

environment variables (optional but recommended)

OPENALEX_MAILTO: your email (e.g., you@example.com). set this to join the polite pool and avoid 429 errors. not a secret; not rate-limited.
OPENALEX_TIMEOUT_MS: request timeout in milliseconds (default: 30000)
OPENALEX_MAX_RETRIES: exponential backoff retries (default: 3)
LOG_LEVEL: debug output verbosity (default: warn)

MCP server tools This skill pairs with the OpenAlex MCP server, which exposes 6 callable functions:

openalex_search: keyword/name resolution to entity IDs
openalex_works: query works with filters, sorts, paging
openalex_authors: search/filter authors
openalex_get: fetch single entity by OpenAlex ID, DOI, ORCID, or ROR
openalex_group_by: count/aggregate by field (e.g., publication year, topic)
openalex_request: generic passthrough to any OpenAlex endpoint

context from user

Research question or entity name (author, paper title, institution, topic keyword)
Desired filters (year range, open-access status, citation threshold, etc.)
Sort preference (by recency, impact, relevance)
Citation format needed (APA, Chicago, full OpenAlex URL)

procedure

resolve entity name to ID
- input: human-readable name (author, title, institution, topic keyword)
- call openalex_search with the name string OR openalex_request to /autocomplete/{entity_type} endpoint (e.g., /autocomplete/authors)
- output: list of candidates with id (e.g., A1234567), display_name, and entity metadata (works_count, cited_by_count, affiliation)
- verify you matched the correct entity by cross-checking display name, affiliation, and work count

identify entity type by ID prefix

input: OpenAlex ID from step 1 (e.g., W2741809807, A1234567, I12345678)
map prefix to entity type using table below
output: confirmed entity type for next step

Prefix	Entity Type
W	works (papers, preprints, books, datasets)
A	authors (researchers)
I	institutions (universities, labs)
S	sources (journals, preprints servers)
T	topics (research domains)
C	concepts (scholarly concepts/keywords)
P	publishers
F	funders (grants, funding bodies)

build filters (comma-separated, ANDed together)
- input: research goal (e.g., "papers on AI by Jane Doe in 2024, open access")
- construct filter string using key:value pairs, comma-delimited
- common filters:
  - publication_year:2024 (single year)
  - from_publication_date:2020-01-01,to_publication_date:2024-12-31 (date range)
  - is_oa:true (open access only)
  - authorships.author.id:A1234567 (by specific author)
  - authorships.institutions.id:I12345678 (by institution)
  - primary_topic.id:T12345 (by primary research topic)
  - cited_by_count:>100 (highly cited works only)
  - type:article (filter by work type: article, preprint, book, etc.)
- output: filter string, e.g., publication_year:2024,is_oa:true,authorships.author.id:A1234567
choose sort order
- input: priority (impact, recency, relevance)
- select sort:
  - cited_by_count:desc (most cited first)
  - publication_date:desc (newest first)
  - works_count:desc (most prolific)
  - relevance_score:desc (keyword match strength, if using search)
- output: sort parameter string
query with openalex_works or domain-specific tool
- input: filter string (step 3), sort (step 4), per-page (≤ 200, default 25)
- call openalex_works(filter=..., sort=..., per_page=..., page=1) for first page
- output: JSON response with:
  - meta.count (total matches across all pages)
  - meta.next_cursor (cursor for next page if using cursor-based paging)
  - results (array of current page results, each with id, title, authors, year, DOI, cited_by_count, open_access, abstract_inverted_index, etc.)
paginate using cursor for large result sets
- input: meta.next_cursor from step 5
- if meta.count > 200 and deep traversal needed, call openalex_works(..., cursor=meta.next_cursor) instead of incrementing page number
- avoid high page numbers (capped around 10000 results); cursor is reliable for arbitrary depth
- output: next page of results with new meta.next_cursor
- repeat until meta.next_cursor is null or desired results reached
reconstruct abstract from inverted index (if needed)
- input: abstract_inverted_index field from a work result (dict mapping word to list of positions)
- example: {"AI": [0, 15], "learning": [1], "systems": [2, 18]}
- algorithm: create array of length = max position + 1, place each word at each of its positions, join with spaces, deduplicate consecutive positions
- output: plain-text abstract string
- note: some works have no abstract; skip if field is null or empty
fetch full-text URL (if user needs PDF)
- input: result from step 5 with open_access field
- check open_access.oa_url (string URL to free PDF/HTML)
- if null, work is paywalled and no free copy is known
- output: URL string or null
- note: OpenAlex does not serve PDFs; it redirects to publisher or preprint host
generate citations
- input: work result with title, authors (array), publication_year, DOI, id
- construct citation: <last name of first author>, <et al. if multiple>. (<publication_year>). <title>. <source>. DOI: <DOI>. OpenAlex: https://openalex.org/<work_id>
- example: Smith, J., et al. (2024). Deep Learning Trends. Nature ML. DOI: 10.1234/nml.2024.001. OpenAlex: https://openalex.org/W3141592653
- output: formatted citation string with mandatory OpenAlex URL
- enforce: every claim reported must include title, authors, year, DOI (if available), and OpenAlex ID + URL
aggregate results (if using openalex_group_by)
- input: grouping field (e.g., publication_year, primary_topic.id, is_oa) and filters
- call openalex_group_by(field=..., filter=...)
- output: array of [{key, key_display_name, count}, ...] showing counts per group
- example: papers per year 2020-2024, OA papers vs closed, papers per topic
- note: meta.count is total matches; individual group counts sum to the same total (verify logic)

decision points

if user provides a name/keyword, not an OpenAlex ID: resolve first via openalex_search or openalex_request autocomplete. verify match before querying.
if user wants a single entity profile (one author, one institution, one work): use openalex_get(openalex_id=...) instead of filter-based query. faster, one call.
if results are empty: broaden the filter (remove year restriction, loosen topic specificity, drop citation threshold). do not invent results. report "no matches" and suggest narrower search was too specific.
if per-page > 200 requested: cap at 200 (API hard limit).
if paging beyond ~10000 results needed: use cursor-based paging (step 6), not page numbers. page offset is capped.
if OPENALEX_MAILTO is not set: proceed anyway, but expect 429 rate-limit errors if burst volume exceeds 10 req/sec. set the variable to join polite pool (100 req/sec).
if timeout occurs on broad query: add filters (year, is_oa, topic) to reduce result cardinality. lower per-page to 50 or 25.
if abstract_inverted_index is present: reconstruct it (step 7) rather than treating raw dict as prose. if null or sparse, note "abstract not available."
if open_access.oa_url is null: work is behind paywall with no free copy known. report this fact; do not claim OA status.
if filter syntax error (400 response): check key spelling against docs.openalex.org, ensure comma separation, use key:value format. retry.
if user asks for full-text content, not metadata: fetch open_access.oa_url and point user to it. OpenAlex does not serve full text; it redirects.
if data freshness is critical: note access date and that OpenAlex updates frequently. counts are point-in-time; re-query if precision matters (e.g., after weeks/months).

output contract

response format: JSON + markdown citation text

metadata list: array of work/author/institution objects, each containing:
- id (OpenAlex ID)
- display_name (title, author name, institution name)
- publication_year / cited_by_count / is_oa / relevant metrics
- DOI (if present)
- open_access.oa_url (if available)
aggregation results (if group_by used): array of {key, key_display_name, count}
citation block: each reported work cited as: <Authors> (<year>). <Title>. <Source>. DOI: <DOI>. OpenAlex: https://openalex.org/<ID>
totals: meta.count (total matches) and results.length (items in current page) clearly distinguished
file location: none; output is stdout JSON + markdown text
error responses:
- 404: "Entity ID not found or invalid prefix."
- 429: "Rate limited; set OPENALEX_MAILTO or reduce request volume."
- 400: "Filter syntax error; check OpenAlex documentation."
- Timeout: "Query too broad; add filters or reduce per-page."

outcome signal

user receives a list of works/authors/institutions with metadata + OpenAlex URLs
user can verify results by checking display names, year, citation counts, institution affiliations
user can click OpenAlex URL and confirm data on the website
if user cites the work, the citation includes DOI + OpenAlex ID for full traceability
if user asks for open-access papers, is_oa:true filter was applied and results include only OA works
if user requested trends (e.g., "papers per year"), aggregation counts sum correctly and align with meta.count
no invented papers, authors, or DOIs appear in results; only real API-returned data
rate-limit errors are absent (OPENALEX_MAILTO is set)

Open Alex

related skills

OpenAlex Skill

intent

inputs

procedure

decision points

output contract

outcome signal