Parse local PDFs and document images with PaddleOCR-VL or PaddleOCR-VL-1.5 on OpenVINO, then route the structured parse into downstream document-to-data or d...
name: local-document-ai-openvino
description: Private document AI for Intel hardware. Parse PDFs, invoices, screenshots, and diagrams locally with OpenVINO, then turn them into structured data or executable notebook/code scaffolds with clear quick-start commands and example prompts.
---
# Private Document AI with OpenVINO
Turn local PDFs, invoices, screenshots, and diagrams into one of two useful outcomes:
1. `to-data`: classify the document and extract structured fields, tables, and JSON.
2. `to-code`: turn screenshots, forms, and architecture diagrams into code or Jupyter notebook scaffolds.
Everything runs locally and is built for Intel CPU/GPU acceleration with OpenVINO.
## Why install this skill
Install this when you want one local workflow for:
- invoice and receipt extraction
- private PDF understanding
- table and key-value extraction
- architecture diagram to notebook generation
- screenshot to HTML/React scaffold generation
This skill is especially good for demos because it already includes:
- medical invoice `to-data` flows
- restaurant invoice `to-data` flows
- architecture diagram `to-code -> jupyter-notebook` flows
- local HTML reports for easy review and screenshots
## 30-second start
Check the environment:
```bash
python "{baseDir}/scripts/check_env.py"
```
Or run directly from the CLI:
```bash
python "{baseDir}/scripts/run_skill.py" --mode to-data --file "/absolute/path/to/invoice.pdf" --out "/absolute/path/to/artifacts/invoice_data" --extract "tables,entities,kv_pairs"
```
## Example prompts
Use prompts like these in OpenClaw:
```text
Use $local-document-ai-openvino to parse this local PDF and give me a structured report.
```
```text
Use $local-document-ai-openvino to extract invoice fields, tables, and key-value pairs from this medical invoice.
```
```text
Use $local-document-ai-openvino to classify this receipt and return normalized JSON.
```
```text
Use $local-document-ai-openvino to turn this architecture diagram into a Jupyter notebook scaffold.
```
```text
Use $local-document-ai-openvino to convert this UI screenshot into an HTML scaffold.
```
## What you get
Typical outputs include:
- `parsed.json`
- `parsed.md`
- `result_report.html`
- `task_output/structured_record.json`
- `task_output/normalized.json`
- `task_output/notebook.ipynb`
- `code_preview.html`
## Best demo paths
If you are evaluating the skill for the first time, start here:
1. `to-data` on an invoice PDF
2. review `result_report.html`
3. inspect `structured_record.json`
4. then try `to-code` with a diagram image and target `jupyter-notebook`
## Core pipeline
Use this skill as a local document-to-action pipeline:
1. Parse the document into a canonical structured representation.
2. Optionally continue into `to-data` or `to-code`.
3. Save outputs into a predictable artifact folder with traceability.
## Read only if needed
Load these references when you need the schema or output contracts:
- `{baseDir}/references/schema.md`
- `{baseDir}/references/mode_guide.md`
- `{baseDir}/references/output_contracts.md`
## Primary entrypoint
Use this published entrypoint:
- CLI orchestrator: `{baseDir}/scripts/run_skill.py`
Do not call these implementation scripts directly from the skill:
- `parse_document.py`
- `transform_doc_to_data.py`
- `transform_doc_to_code.py`
## Local readiness
Check the environment before processing real documents:
```bash
python "{baseDir}/scripts/check_env.py"
```
Install the base dependencies in a virtual environment:
```bash
python -m pip install -r "{baseDir}/requirements.txt"
```
Install the third-party `paddleocr_vl_openvino` package only after reviewing the source or wheel and only when you intend to run the real OCR pipeline. Prefer installing from a reviewed local wheel path inside a virtual environment.
Run a quick orchestration smoke test:
```bash
python "{baseDir}/scripts/smoke_test.py"
```
Model assets are discovered from:
- `PADDLEOCR_VL_OPENVINO_MODEL_DIR`
- `PADDLEOCR_VL_LAYOUT_MODEL_DIR` plus `PADDLEOCR_VL_VLM_MODEL_DIR`
- `{baseDir}/models/paddleocr-vl-1.5-openvino/`
- `{baseDir}/models/paddleocr-vl-openvino/`
Allow model auto-download only when the user explicitly approves it.
## Supported modes
### `parse`
Use when the user wants the structured parse only.
Outputs:
- `parsed.json`
- `parsed.md`
- `result_report.html`
- extracted layout, tables, or figures when available
### `to-data`
Use when the user wants structured extraction, normalization, or document classification.
Typical outputs under `task_output/`:
- `entities.json`
- `kv_pairs.json`
- `table_index.json`
- `normalized.json`
- `structured_record.json`
- `traceability.json`
### `to-code`
Use when the user wants implementation-oriented output from the parse result.
Supported targets:
- `react`
- `html-css`
- `json-schema`
- `jupyter-notebook`
Typical outputs under `task_output/`:
- `component_map.json`
- `field_schema.json`
- `ui_blueprint.json`
- `notes.md`
- `traceability.json`
- target-specific artifacts such as `app.jsx`, `index.html`, `styles.css`, `schema.json`, `notebook.ipynb`, or `notebook_plan.json`
Treat all generated code and notebooks as drafts. Review them before running, publishing, or connecting them to real systems.
## Published package scope
The published ClawHub bundle is intentionally CLI-first.
- main workflow: `scripts/run_skill.py`
- diagnostics: `scripts/check_env.py`
- smoke verification: `scripts/smoke_test.py`
Developer-only local UI helpers are kept out of the public release bundle.
## Pipeline rules
Always follow these rules:
1. Prefer local execution.
2. Always parse first into `parsed.json`.
3. Generate downstream artifacts from `parsed.json`, not raw OCR text alone.
4. Preserve page numbers, reading order, block types, and source anchors when possible.
5. Write traceability for downstream outputs.
6. Mark low-confidence regions or assumptions explicitly.
7. Do not silently drop tables, figures, formulas, charts, or key-value regions.
8. Save outputs into one artifact folder per run.
9. For confidential documents, prefer an explicit private `--out` directory and remove artifacts after review.
## Output contract
Default output folder:
`./artifacts/<document_stem>/`
Expected top-level outputs:
- `effective_config.json`
- `run_report.json`
- `parsed.json`
- `parsed.md`
- `result_report.html`
- `task_output/`
`to-code` runs may also emit:
- `code_preview.html`
## CLI examples
### Parse
```bash
python "{baseDir}/scripts/run_skill.py" \
--mode parse \
--file "/absolute/path/to/report.pdf" \
--out "/absolute/path/to/artifacts/report_parse"
```
### To-data
```bash
python "{baseDir}/scripts/run_skill.py" \
--mode to-data \
--file "/absolute/path/to/invoice.pdf" \
--out "/absolute/path/to/artifacts/invoice_data" \
--extract "tables,entities,kv_pairs"
```
### To-code
```bash
python "{baseDir}/scripts/run_skill.py" \
--mode to-code \
--file "/absolute/path/to/ui_mockup.png" \
--out "/absolute/path/to/artifacts/ui_code" \
--target "react" \
--title "Generated App"
```
### To-code notebook target
```bash
python "{baseDir}/scripts/run_skill.py" \
--mode to-code \
--file "/absolute/path/to/architecture_diagram.png" \
--out "/absolute/path/to/artifacts/notebook_code" \
--target "jupyter-notebook" \
--title "OpenVINO Notebook"
```
## Slash-command examples
```text
/skill local-document-ai-openvino parse file=./docs/report.pdf
```
```text
/skill local-document-ai-openvino to-data file=./docs/invoice.pdf extract=tables,entities,kv_pairs
```
```text
/skill local-document-ai-openvino to-code file=./mockups/architecture.png target=jupyter-notebook
```
## Optional local demo UI
Start the local UI when the user wants an interactive demo page:
```bash
python "{baseDir}/scripts/serve_skill_ui.py"
```
The UI lets the user:
- preview a local file
- choose `parse`, `to-data`, or `to-code`
- choose the `to-code` target
- run the pipeline and inspect the generated local HTML reports
The bundled UI only allows preview/run access for local files under the skill directory and common user content folders such as Downloads, Documents, Desktop, and Pictures.
## Failure behavior
If a run fails:
- state which stage failed
- do not claim outputs were created if they were not
- prefer writing `error.json` with failure details
- recommend `parse` first when the downstream request is ambiguous
- surface stderr or a concise failure summary when available
## Safety notes
- Use a virtual environment for dependency installation.
- Review and approve model downloads only when you explicitly intend to.
- Keep outputs in a private local folder when documents are sensitive.
- Review generated code and notebooks before execution.
- Delete artifacts when they are no longer needed.
- The wrapper always uses the bundled local scripts and the current Python interpreter. It does not allow custom interpreter or script-directory overrides.
## Short reminder
Present this skill as a local document-understanding workflow with downstream actions, not as a plain OCR wrapper.
don't have the plugin yet? install it then click "run inline in claude" again.