AI-powered photo pre-screening using MobileCLIP2-S0 model. 18x faster than ViT-L/14 with 80% selection consistency (Top-10 overlap 8/10). Use when the user w...
---
name: photo-screener
version: 1.0.0
description: |
AI-powered photo pre-screening using MobileCLIP2-S0 model.
18x faster than ViT-L/14 with 80% selection consistency (Top-10 overlap 8/10).
Use when the user wants to:
- Filter/screen a large batch of photos before sending to LLM
- Score photos by aesthetic quality
- Remove near-duplicate photos (burst shots)
- Classify photos by scene type
- Prepare photos for multimodal LLM processing
Triggers: User mentions filtering photos, screening images, aesthetic scoring,
removing duplicates, classifying scenes, or preparing photos for LLM.
Auto-skipped when photo count β€ user's requested output count OR β€ 20 (batch_size).
Only triggered when photo count exceeds both thresholds.
Dependencies:
Python: torch, open-clip-torch, pillow, numpy, pillow-heif (optional, for HEIC/HEIF)
Model: MobileCLIP2-S0 (~300MB, downloaded on demand with user confirmation)
Check: bash scripts/setup_deps.sh
Model Download:
The model is NOT pre-downloaded. On first run:
- Interactive mode: prompts user for confirmation
- Non-interactive mode: exits with manual download instructions
- Uses HuggingFace mirror (hf-mirror.com) for China acceleration
- Add --auto-download to skip confirmation
metadata:
openclaw:
homepage: https://github.com/konanok/photo-skills
emoji: "π"
requires:
bins:
- python3
---
# Photo Screener β MobileCLIP-powered Smart Pre-screening
Intelligently filter, deduplicate, and classify photos using Apple MobileCLIP2-S0, preparing them for efficient multimodal LLM processing.
## Why MobileCLIP2-S0?
Based on 4-model comparison test:
| Metric | MobileCLIP2-S0 | ViT-L/14 (baseline) |
| ----------------------- | :---------------: | :-----------------: |
| **Encoding Speed** | **26.7ms/img** β‘ | 483.7ms/img |
| **Speed Ratio** | **18.1x faster** | 1x |
| **Pearson Correlation** | **0.78** | 1.0 (baseline) |
| **Top-10 Overlap** | **8/10** | 10/10 |
| **Model Size** | **74.8M** | 427.6M |
| **Embed Dim** | 512 | 768 |
> π‘ 1/18 of the time, 80% selection consistency β best speed/quality tradeoff.
## Dependencies
**Declaration file:** `requirements.txt`
**Prefer venv**: Before running scripts, activate the project-root virtual environment (e.g. `.venv/`). If it doesn't exist, create one first:
```bash
# Create venv and install dependencies (recommended)
python3 -m venv .venv
source .venv/bin/activate
pip install -r photo-screener/requirements.txt
# Or use the skill's setup script (checks + installs)
bash photo-screener/scripts/setup_deps.sh
# Before each session, activate venv
source .venv/bin/activate
```
Alternatively, install globally:
```bash
pip3 install -r photo-screener/requirements.txt
```
## Model Download Policy
**The model is NOT pre-downloaded.** This is by design to avoid:
- Unexpected large downloads (~300MB)
- Wasted bandwidth if the user doesn't need this skill
### Download Behavior
| Mode | Behavior |
| --------------------------------- | --------------------------------------- |
| **Interactive** (terminal) | Prompts user: "ζ―ε¦δΈθ½½ζ¨‘εοΌ[Y/n]" |
| **Non-interactive** (piped/agent) | Exits with manual download instructions |
| **--auto-download flag** | Downloads without confirmation |
### Manual Download
```bash
# Using China mirror (recommended)
HF_ENDPOINT=https://hf-mirror.com python3 -c \
"import open_clip; open_clip.create_model_and_transforms('MobileCLIP2-S0', pretrained='dfndr2b')"
# Or run setup script
bash photo-screener/scripts/setup_deps.sh
```
## Configuration
Copy `config.example.toml` to `config.toml` and edit. See `config.example.toml` for all available options.
## Usage
```bash
# Basic screening
python3 scripts/screen.py ~/data/output/thumbnails
# Custom thresholds
python3 scripts/screen.py ~/data/output/thumbnails \
--min-score 5.0 --sim-threshold 0.95
# Keep top 50
python3 scripts/screen.py ~/data/output/thumbnails --top-k 50
# Auto-download model (skip confirmation)
python3 scripts/screen.py ~/data/output/thumbnails --auto-download
# Pass specific file paths instead of a directory
python3 scripts/screen.py \
--paths ~/data/RAW/001/thumbnails/DSC_0001.jpg \
~/data/RAW/001/thumbnails/DSC_0002.jpg
# Dry run
python3 scripts/screen.py ~/data/output/thumbnails --dry-run
```
| Option | Description | Default |
| ----------------- | ----------------------------------------------- | -------- |
| `input_dir` | Directory with photos (optional with --paths) | required |
| `--paths` | Specific image paths (alternative to input_dir) | β |
| `--output`, `-o` | Output JSON path | auto |
| `--min-score` | Min aesthetic score (1-10) | 4.0 |
| `--sim-threshold` | Dedup threshold (0-1) | 0.97 |
| `--batch-size` | Max photos per LLM batch | 20 |
| `--top-k` | Keep only top K | all |
| `--recursive` | Search subdirectories | off |
| `--auto-download` | Skip model download prompt | off |
| `--dry-run` | Preview only | off |
## Pipeline
```
Photos (thumbnails)
β
βΌ Stage 1: MobileCLIP Encoding (~27ms/image)
β β 512-dim normalized embeddings
β
βββ Stage 2: Aesthetic Scoring
β βββ LAION MLP (zero-padded 512β768 dim)
β βββ Remove below threshold (default: 4.0)
β
βββ Stage 3: Similarity Dedup
β βββ Cosine similarity + greedy dedup
β βββ Higher score = higher priority
β
βββ Stage 4: Scene Classification
β βββ Zero-shot text matching (14 categories)
β
βββ Output: filter_report.json
```
## Agent Integration
When using this skill from an agent:
1. **Check dependencies**: `bash scripts/setup_deps.sh`
2. **Run with --auto-download**: in agent context, use `--auto-download` to skip interactive prompt
3. **Or pre-download**: run setup_deps.sh first which handles model download with user confirmation
```bash
# Agent-friendly command (auto-download)
python3 photo-screener/scripts/screen.py \
~/data/output/{session-id}/thumbnails \
--output ~/data/output/{session-id}/filter_report.json \
--auto-download
```
don't have the plugin yet? install it then click "run inline in claude" again.