Alibabacloud Dlf Manage

Query Catalog, database, and table metadata resources in Alibaba Cloud Data Lake Formation (DLF). Provides read-only queries via the DLF OpenAPI Python SDK,...

installs

stars

karma

SkillRank score ↗

8.3/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-06-13

alibabacloud-dlf-manage provides read-only metadata queries across catalogs, databases, and tables in alibaba cloud data lake formation via python sdk. supports listing, fuzzy search, and schema inspection with deliberate api selection guidance to minimize unnecessary payload.

structure

9.0

trigger phrases

8.0

procedure

9.0

edge cases

7.0

documentation

8.0

view original SKILL.md from clawhubclick to expand

---
name: alibabacloud-dlf-manage
description: |
  Query Catalog, database, and table metadata resources in Alibaba Cloud Data Lake Formation (DLF).
  Provides read-only queries via the DLF OpenAPI Python SDK, supporting listing and viewing
  Catalogs, databases, tables with their detailed information and Schema definitions.
  Use cases: "list available Catalogs", "list databases", "view table schema",
  "search tables", "search tables by name", "fuzzy search", "view DLF metadata",
  "what databases are in the data lake", "what columns does a table have",
  "find tables whose name contains xxx".
  This Skill only contains read-only operations — no create, modify, or delete operations.
---

# DLF Data Lake Metadata Query

Query Catalog, Database, and Table metadata resources in Alibaba Cloud Data Lake Formation (DLF).

> **CRITICAL: Use only the Python SDK script provided by this Skill.**
> All operations go through the DLF Python SDK (`alibabacloud-dlfnext20250310`) via `scripts/dlf_metadata_query.py`.
> This Skill does not invoke any shell-based command-line client and does not require AI-Mode configuration.
>
> - **DO NOT** attempt access via any shell-based command-line client — DLF is not exposed through one in this Skill
> - **DO NOT** use curl, wget, or other HTTP clients to call the DLF API directly
> - **MUST** use the `scripts/dlf_metadata_query.py` script provided by this Skill, which wraps the DLF Python SDK
> - All query operations are executed via `python3 scripts/dlf_metadata_query.py <action> [options]`

## Architecture

```
Catalog (Data Catalog)
  └── Database
        └── Table
              ├── Schema (column definitions)
              ├── PartitionKeys (partition keys)
              ├── PrimaryKeys (primary keys)
              └── Options (table properties)
```

## Installation

```bash
pip install -r requirements.txt
```

`requirements.txt` pins the full transitive dependency closure (including
`alibabacloud-dlfnext20250310==3.0.0`) for reproducible installs.

> **Pre-check: Python SDK dependency**
>
> ```bash
> python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"
> ```
> If not installed, run `pip install -r requirements.txt`.

## Authentication

> **Pre-check: Alibaba Cloud Credentials Required**
>
> Use the default credential chain (CredentialClient) to obtain credentials automatically. Supported sources (in priority order):
> 1. Environment variables (ALIBABA_CLOUD_ACCESS_KEY_ID / ALIBABA_CLOUD_ACCESS_KEY_SECRET)
> 2. Configuration file (~/.alibabacloud/credentials)
> 3. ECS Instance RAM Role
> 4. OIDC Role ARN
>
> **Security Rules:**
> - **NEVER** read, echo, or print AK/SK values
> - **NEVER** ask the user to input AK/SK directly in the conversation or command line
> - **NEVER** explicitly handle or pass AK/SK in code — rely on the default credential chain
>
> See https://help.aliyun.com/document_detail/378659.html for credential configuration details.

## RAM Permissions

This Skill only involves read-only operations (List / Get). See [references/ram-policies.md](references/ram-policies.md) for the full permission list.

> **[MUST] Permission Failure Handling:** When any command or API call fails due to permission errors at any point during execution, follow this process:
> 1. Read `references/ram-policies.md` to get the full list of permissions required by this SKILL
> 2. Pause and wait until the user confirms that the required permissions have been granted

## Parameter Confirmation

> **IMPORTANT: Parameter Confirmation** — Before invoking the API,
> the following user-specific parameters must be confirmed with the user; do not assume them.
> Region defaults to cn-hangzhou; if the user does not specify one, use the default without asking.

| Parameter | Required | Description | Default |
|------|------|------|--------|
| `region` | No | Region ID | cn-hangzhou |
| `catalog_name` | Conditional | Catalog name (`--catalog`, required for GetCatalog) | - |
| `catalog_id` | Conditional | Catalog ID (`--catalog-id`, required when querying databases/tables, e.g. clg-paimon-xxxx) | - |
| `database` | Conditional | Database name (`--database`) | - |
| `table` | Conditional | Table name (`--table`) | - |

## Core Workflow

> The script automatically reads AK/SK from environment variables and reports a clear error if they are missing.
> Region defaults to cn-hangzhou; use the default if the user does not specify one.

**You MUST use** `scripts/dlf_metadata_query.py` to query metadata. Do not use shell-based command-line clients or curl. Actions are in **kebab-case**.

> **CRITICAL — list vs. list-*-details: pick the lightest action that satisfies the request.**
> - For listing names / IDs (including fuzzy search): use `list-databases` / `list-tables`. These call the `ListDatabases` / `ListTables` API.
> - For full attributes / Schema / properties: use `list-database-details` / `list-table-details` / `get-database` / `get-table`. These call the heavier `*-details` / `Get*` APIs.
> - **Default to the lightweight `list-*` action** unless the user explicitly asks for full configuration, Schema, or properties. Calling `list-*-details` when only names are needed is incorrect.

### Query Operations

```bash
# ---- Catalog ----

# 1. List all Catalogs (names + minimal info — preferred for listing/searching)
python3 scripts/dlf_metadata_query.py list-catalogs

# 2. Fuzzy-search Catalogs by name (uses ListCatalogs)
python3 scripts/dlf_metadata_query.py list-catalogs --pattern test

# 3. Get Catalog details (by name) — use only when full Catalog config is needed
python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name>

# 4. Get Catalog details (by ID) — use only when full Catalog config is needed
python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id>

# ---- Database ----

# 5. List databases (NAMES only — DEFAULT for "list / show / which databases", calls ListDatabases)
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id>

# 6. List database details (full attributes, calls ListDatabaseDetails) — use ONLY when the user asks for properties / configs / location / owner
python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id>

# 7. Get a single database's details (calls GetDatabase) — use when the user asks for ONE specific database's full info
python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name>

# ---- Table ----

# 8. List tables (NAMES only — DEFAULT for "list / show / which tables", calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name>

# 9. Fuzzy-search tables by name (DEFAULT for "search / find tables matching ...", calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%

# 10. List table details with Schema (calls ListTableDetails) — use ONLY when the user explicitly asks for Schema / columns / properties of all tables
python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name>

# 11. Get a single table's details with Schema (calls GetTable) — use when the user asks for ONE specific table's Schema
python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name>
```

Specify region (defaults to cn-hangzhou): add `--region cn-shanghai`

### Typical Query Flow

```
1. list-catalogs          → get catalog_name and catalog_id (names only)
2. list-databases         → use catalog_id to view available database names
3. list-tables            → use catalog_id + database to view available table names
4. get-table              → use catalog_id + database + table to view ONE table's Schema
```

> Only step 4 (`get-table`) is a "details" call, because Schema is what the user actually asked for. Steps 1–3 stay on the lightweight `list-*` actions.

### Fuzzy Search

All list operations support the `--pattern` argument for fuzzy name matching, using `%` as the wildcard. **Use the lightweight `list-*` action for pattern search unless the user explicitly asks for the full Schema / properties of every match.**

```bash
# Search Catalogs whose name contains "test"
python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test%

# Search databases whose name starts with "prod_"
python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> --pattern prod_%

# Search tables whose name starts with "user" (DEFAULT — calls ListTables)
python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --pattern user%
```

> **Anti-pattern**: do not use `list-table-details --pattern ...` to search by name. That calls `ListTableDetails` and is heavier than required. Reach for `list-table-details` only when the user has explicitly asked for the Schema / columns of every matching table.

### Output Format

- **List operations**: `{"count": N, "items": [...]}`
- **Get operations**: a single JSON object
- **Errors**: `{"error": "...", "hint": "..."}`

## Verification

If `list-catalogs` returns the Catalog list, the connection and permissions are working:

```bash
python3 scripts/dlf_metadata_query.py list-catalogs --region cn-hangzhou
```

See [references/verification-method.md](references/verification-method.md) for detailed verification steps.

## Best Practices

1. **Prefer the lightweight `list-*` action over `list-*-details` / `get-*`.** When the task only requires listing resource **names**, **IDs**, or **fuzzy matching**, you MUST use `list-catalogs` / `list-databases` / `list-tables` (which call `ListCatalogs` / `ListDatabases` / `ListTables`). Only use `list-*-details` or `get-*` when the user explicitly asks for full configuration, Schema, columns, properties, owner, or location. Reaching for the heavier API when the lighter one suffices is incorrect.
2. **List before Get**: use list-catalogs to obtain catalog_id first, then use catalog_id to query databases and tables.
3. **Use fuzzy search with the lightweight action**: the `--pattern` argument supports fuzzy matching; use it on `list-tables` (not `list-table-details`) unless full Schema is also requested.
4. **Pagination**: use `--max-results` and `--page-token` for paginated queries when there is a lot of data.
5. **Catalog ID vs Name**: when querying Database/Table, use `catalog_id` (e.g. clg-paimon-xxxx), not the catalog name.

## References

| Reference | Description |
|---------|------|
| [references/related-apis.md](references/related-apis.md) | Full API list and parameter descriptions |
| [references/ram-policies.md](references/ram-policies.md) | RAM permission policy |
| [references/acceptance-criteria.md](references/acceptance-criteria.md) | Acceptance criteria |
| [references/verification-method.md](references/verification-method.md) | Verification method |
| [DLF API overview](https://help.aliyun.com/zh/dlf/dlf-2-0/developer-reference/api-dlfnext-2025-03-10-overview) | Official API documentation |
| [DLF product documentation](https://help.aliyun.com/zh/dlf/dlf-2-0) | Product documentation |
| [Python SDK PyPI](https://pypi.org/project/alibabacloud-dlfnext20250310/) | SDK version info |

don't have the plugin yet? install it then click "run inline in claude" again.

extracted 6 mandatory components (intent, inputs, procedure with explicit step I/O, decision points covering action selection and error cases, output contract with json formats, outcome signal with success and failure criteria), clarified auth and pagination edge cases, added component depth while preserving original workflow and best practices.

intent

query catalog, database, and table metadata resources in alibaba cloud data lake formation (dlf). this skill provides read-only access to dlf metadata via the official python sdk, supporting listing and viewing catalogs, databases, tables, schemas, and properties. use it when you need to explore what catalogs exist, which databases live in a catalog, which tables exist in a database, or what columns and schema a specific table has. common scenarios: "list available catalogs", "which databases are in this catalog", "find tables matching a name pattern", "what columns does this table have", "show me the schema for table X".

inputs

parameter	required	description	type	default
`region`	no	alibaba cloud region id	string	`cn-hangzhou`
`catalog_name`	conditional	catalog name, required only for `get-catalog`	string	-
`catalog_id`	conditional	catalog id (format: `clg-paimon-xxxx`), required for database and table queries	string	-
`database`	conditional	database name, required for table queries	string	-
`table`	conditional	table name, required for single table schema queries	string	-
`pattern`	optional	fuzzy search pattern using `%` as wildcard (e.g. `user%`, `%test%`)	string	-
`max_results`	optional	max results per page for paginated queries	integer	-
`page_token`	optional	pagination token for fetching next page	string	-

external connections

alibaba cloud credentials (required): the skill uses alibaba cloud's default credential chain in this priority order:

environment variables: ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET
credentials file at ~/.alibabacloud/credentials (aliyun cli format)
ecs instance ram role (if running on ecs)
oidc role arn

see https://help.aliyun.com/document_detail/378659.html for setup.

python sdk dependency: alibabacloud-dlfnext20250310==3.0.0 (pinned in requirements.txt)

ram permissions required: read-only operations only. see references/ram-policies.md for full permission list. typical permissions: dlf:DescribeCatalog, dlf:DescribeDatabase, dlf:DescribeTable, dlf:ListCatalogs, dlf:ListDatabases, dlf:ListTables.

procedure

all operations use the script scripts/dlf_metadata_query.py. do not use curl, wget, or shell-based clients.

pre-check: verify python sdk is installed

python3 -c "from alibabacloud_dlfnext20250310.client import Client; print('SDK OK')"

if not installed, run:

pip install -r requirements.txt

step 1: list all catalogs

invoke action list-catalogs to get catalog names and ids. this is the lightweight list operation (not details).

python3 scripts/dlf_metadata_query.py list-catalogs [--region <region>]

input: optional region (defaults to cn-hangzhou). output: json object {"count": N, "items": [{"catalog_name": "...", "catalog_id": "clg-paimon-..."}, ...]}. use when: user asks "list catalogs", "show catalogs", "what catalogs exist".

step 2: search catalogs by name pattern (optional)

if user wants to find catalogs matching a name, add --pattern to step 1.

python3 scripts/dlf_metadata_query.py list-catalogs --pattern %test% [--region <region>]

input: pattern string with % wildcard. output: filtered list of catalogs matching the pattern. use when: user asks "find catalogs with X in the name", "search catalogs".

step 3: get full details of a single catalog (conditional)

if user asks for full catalog configuration, owner, properties, or location, invoke get-catalog (heavier operation).

python3 scripts/dlf_metadata_query.py get-catalog --catalog <catalog_name> [--region <region>]

or by id:

python3 scripts/dlf_metadata_query.py get-catalog-by-id --id <catalog_id> [--region <region>]

input: catalog_name or catalog_id. output: single json object with full catalog details (properties, owner, location, created_time, etc). use when: user explicitly asks "show catalog config", "what is the location of catalog X", "catalog properties".

step 4: list databases in a catalog

invoke action list-databases to get database names. this is the lightweight list operation.

python3 scripts/dlf_metadata_query.py list-databases --catalog-id <catalog_id> [--region <region>] [--pattern <pattern>]

input: catalog_id (from step 1); optional pattern for fuzzy search. output: json object {"count": N, "items": [{"database_name": "...", ...}, ...]}. use when: user asks "list databases", "which databases are in this catalog", "show databases".

step 5: get full details of databases (conditional)

if user asks for database owner, location, creation time, or full properties, invoke list-database-details or get-database.

python3 scripts/dlf_metadata_query.py list-database-details --catalog-id <catalog_id> [--region <region>]

or for a single database:

python3 scripts/dlf_metadata_query.py get-database --catalog-id <catalog_id> --database <db_name> [--region <region>]

input: catalog_id; database_name (optional for get-database). output: json object(s) with full database properties (owner, location, parameters, created_time). use when: user explicitly asks "show database config", "database properties", "who owns this database".

step 6: list tables in a database

invoke action list-tables to get table names. this is the lightweight list operation.

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> [--region <region>] [--pattern <pattern>]

input: catalog_id, database name; optional pattern for fuzzy search. output: json object {"count": N, "items": [{"table_name": "...", ...}, ...]}. use when: user asks "list tables", "which tables are in database X", "show tables", "find tables matching Y".

step 7: get full schema and details of a single table

if user asks for columns, schema, partition keys, or full table properties, invoke get-table.

python3 scripts/dlf_metadata_query.py get-table --catalog-id <catalog_id> --database <db_name> --table <table_name> [--region <region>]

input: catalog_id, database name, table name. output: single json object with full table details (schema with column names/types, partition_keys, primary_keys, properties, location, created_time, owner, etc). use when: user asks "show table schema", "what columns does table X have", "table definition", "table properties".

step 8: get full schema of all tables in a database (conditional)

if user asks for schema of all tables in a database, invoke list-table-details.

python3 scripts/dlf_metadata_query.py list-table-details --catalog-id <catalog_id> --database <db_name> [--region <region>]

input: catalog_id, database name. output: json object {"count": N, "items": [table1_with_schema, table2_with_schema, ...]}. use when: user asks "show schema of all tables", "list all tables with columns".

step 9: handle pagination for large result sets (conditional)

if output indicates more results available, use pagination parameters.

python3 scripts/dlf_metadata_query.py list-tables --catalog-id <catalog_id> --database <db_name> --max-results 100 --page-token <token_from_previous_response>

input: max_results (default varies), page_token from previous response. output: next batch of results; response includes new page_token if more data exists. use when: list operations return more than one page of data.

decision points

which action to use: lightweight vs. details?

if user asks for names only (list, show, which, find by pattern): use lightweight list-catalogs, list-databases, list-tables. these call list apis.
if user asks for schema, columns, full config, properties, owner, or location: use heavier get-catalog, get-database, get-table, or list-*-details apis.
default to lightweight unless user explicitly requests full details.

where is catalog_id?

obtain it from step 1 (list-catalogs) before querying databases or tables.
if user provides only catalog name, first run list-catalogs --pattern <name> to find the id.

should we search by pattern?

if user says "find tables matching X", "tables starting with Y", "search for Z": add --pattern to the list action.
pattern uses sql like syntax: % is wildcard (e.g. user%, %test%, %foo%).
use pattern on the lightweight action, not the details action, unless user also asks for schema of every match.

auth fails or permission denied?

read references/ram-policies.md to identify missing permissions.
pause and ask user to confirm permissions have been granted.
do not retry immediately.

no results returned?

if list returns empty, the resource (catalog, database, or table) does not exist or the pattern matched nothing. inform the user.
do not assume the operation failed; it may be a legitimate empty result.

network timeout or connection error?

check that alibaba cloud credentials are set (env vars or credentials file).
verify region is correct (defaults to cn-hangzhou).
retry once; if it persists, ask user to check network and credential setup.

output contract

all operations return json. success responses are structured as follows:

list operations (list-catalogs, list-databases, list-tables, list-database-details, list-table-details):

{
  "count": integer,
  "items": [
    {
      "catalog_name": "string",
      "catalog_id": "string",
      ... (additional fields per resource type)
    }
  ],
  "page_token": "string (optional, if more results exist)"
}

get operations (get-catalog, get-catalog-by-id, get-database, get-table):

{
  "catalog_name": "string",
  "catalog_id": "string",
  "owner": "string",
  "location": "string",
  "created_time": "timestamp",
  "properties": {...},
  ... (additional fields per resource type)
}

table schema (get-table only):

{
  "table_name": "string",
  "database_name": "string",
  "schema": [
    {
      "column_name": "string",
      "data_type": "string",
      "comment": "string (optional)"
    }
  ],
  "partition_keys": [...],
  "primary_keys": [...],
  "properties": {...},
  "location": "string",
  "owner": "string",
  "created_time": "timestamp"
}

error responses:

{
  "error": "error code or message",
  "hint": "diagnostic hint or next steps"
}

file locations: no files are written by default. if you need to persist results, save the json output to a file explicitly (e.g. python3 scripts/dlf_metadata_query.py list-catalogs > catalogs.json).

outcome signal

the skill worked if:

list operations return a non-error json response with "count" and "items": even if count is 0, this means the query succeeded and the resource(s) do not exist or do not match the pattern.
get operations return a single json object with resource details: the presence of "catalog_id", "table_name", "schema", or other expected fields confirms the operation succeeded.
schema is visible in get-table response: the "schema" array contains column definitions with "column_name" and "data_type".
no "error" key in the response: responses containing an "error" key indicate failure (e.g. permission denied, auth failure, resource not found).
pattern matching returned filtered results: if you used --pattern, the results should match the pattern.
pagination token is present (optional): if the response includes "page_token", more data is available; fetch the next page using that token.

the skill did not work if:

response contains "error" and "hint" fields.
credentials are missing or invalid (error mentions "Unauthorized" or "AccessKeyId not found").
region does not exist or is misspelled (error mentions "Invalid region").
catalog_id or database name is incorrect (error mentions "CatalogNotFound" or "DatabaseNotFound").
user lacks ram permissions (error mentions "AccessDenied" or "Forbidden").

Alibabacloud Dlf Manage

related skills

intent

inputs

external connections

procedure

step 1: list all catalogs

step 2: search catalogs by name pattern (optional)

step 3: get full details of a single catalog (conditional)

step 4: list databases in a catalog

step 5: get full details of databases (conditional)

step 6: list tables in a database

step 7: get full schema and details of a single table

step 8: get full schema of all tables in a database (conditional)

step 9: handle pagination for large result sets (conditional)

decision points

output contract

outcome signal