Ai Image To Code

Use when (1) user provides a UI screenshot or image and asks to convert it into HTML, CSS, or component code. (2) user says "turn this into code", "rebuild t...

view source

installs

stars

karma

SkillRank score ↗

7.4/ 10

evaluated by implexa, claude-haiku-4-5 · 2026-05-26

skill-factory-image-to-code converts ui screenshots and mockups into runnable html, css, or react component code via visual analysis and structured code generation. covers three modes: default html, react with tailwind, and layout description only.

structure

9.0

trigger phrases

8.0

procedure

8.0

edge cases

6.0

documentation

7.0

strengths

view original SKILL.md from clawhubclick to expand

---
name: ai-image-to-code
description: >
  Use when (1) user provides a UI screenshot or image and asks to convert it into HTML, CSS, or component code. 
  (2) user says "turn this into code", "rebuild this UI", "code this design", or "generate HTML from screenshot". 
  (3) user pastes an image and says "write the React component for this". 
license: MIT
metadata:
  version: "1.0"
  category: design
  author: wangjipeng
  sources:
    - https://github.com/MiniMax-AI/skills
---

# AI Image to Code

Use when (1) user provides a UI screenshot or image and asks to convert it into HTML, CSS, or component code. (2) user says "turn this into code", "rebuild this UI", "code this design", or "generate HTML from screenshot". (3) user pastes an image and says "write the React component for this".

## Core Position

This skill solves the specific problem of: *a visual UI mockup needs to become actual runnable frontend code — not just a description, but a working implementation.*

This skill IS NOT:
- An image generation tool — it converts existing images to code, not creates images
- A design tool — it interprets and codes a design, not create the design
- A backend integration tool — it outputs HTML/CSS/JS, not server code

This skill IS activated ONLY when: image (screenshot/mockup) + code generation intent are both present.

## Modes

### `/ai-image-to-code`

**Default mode.** Converts a UI image into a complete HTML/CSS implementation.

When to use: User provides a screenshot and wants a working HTML page that resembles it.

### `/ai-image-to-code/react`

Outputs a React functional component using Tailwind CSS.

When to use: User explicitly asks for React or a component, not a plain HTML page.

### `/ai-image-to-code/describe`

Provides a detailed text description of the layout without writing code.

When to use: User only wants to understand the layout before committing to code generation.

## Execution Steps

### Step 1 — Analyze the Image

1. Receive image (pasted, file attachment, or URL)
2. Use vision model to inspect the image and extract:
   - Layout structure (header, sidebar, main content, footer)
   - Color palette (primary, secondary, background, text, accent)
   - Typography (headings, body, labels — size and weight hierarchy)
   - Spacing system (padding, margins, gaps)
   - Component types (buttons, inputs, cards, lists, navigation)
   - Visual hierarchy (what stands out, what recedes)
3. If the image is complex (>10 distinct UI sections), focus on the main content area

### Step 2 — Plan the Code Structure

| Image Content | Recommended Output |
|---|---|
| Landing page | Single HTML with embedded CSS |
| Dashboard | HTML + CSS grid layout |
| Mobile app screen | Mobile-first responsive HTML |
| Form / login page | Semantic HTML form with proper inputs |
| Card / list UI | Component-based HTML with classes |
| Chart / data visualization | SVG or canvas-based rendering |

### Step 3 — Generate Code

**HTML/CSS output** (default):
```html
<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>UI</title>
  <style>
    /* Extracted colors, typography, spacing from image */
  </style>
</head>
<body>
  <!-- Structure matching the image layout -->
</body>
</html>
```

**React + Tailwind** (react mode):
```jsx
export function UICard() {
  return (
    <div className="p-6 bg-white rounded-xl shadow-sm">
      {/* Component matching image */}
    </div>
  );
}
```

### Step 4 — Validate

- Key layout sections (header, main, sidebar) are present
- Colors are within ±10% of the original image (subjective match)
- No invented content — placeholder text is generic ("Card title", not specific brand names)
- HTML is valid (proper tag nesting, no unclosed tags)

## Mandatory Rules

### Do not

- Do not invent brand names, specific product names, or proprietary text not visible in the image
- Do not claim the output is pixel-perfect — it is an interpretation
- Do not generate backend code, JavaScript logic, or API calls
- Do not reproduce copyrighted UI elements (logos, icons) — use generic equivalents

### Do

- Use placeholder text that fits the context (e.g., "Search..." for a search bar)
- Preserve the visual hierarchy (primary > secondary > tertiary)
- Use realistic placeholder data for images (e.g., via placeholder.com or picsum)
- State explicitly: "This is an approximation; fine-tune colors and spacing as needed"

## Quality Bar

**A good output:**
- All major layout regions are present and positioned correctly
- Color palette is recognizably derived from the image
- Typography hierarchy matches (heading size > body size)
- Code is valid, runnable HTML/CSS without external dependencies beyond a CDN

**A bad output:**
- Layout is scrambled or missing major sections
- Output includes broken or unclosed HTML tags
- Fabricated text content not appropriate to the UI context
- Output requires non-free dependencies or local asset files

## Good vs. Bad Examples

| Scenario | Bad Output | Good Output |
|---|---|---|
| E-commerce product card | Generic lorem ipsum text | "Price: $49.99 — Add to Cart" contextually appropriate |
| Dark mode UI | Ignores dark theme | Uses dark background, light text, correct contrast |
| Mobile screenshot | Desktop-only output | `max-width: 375px` container, mobile-first |
| Complex dashboard | One undifferentiated div | Grid layout with sidebar, header, main panels |

## References

- `references/` — Color extraction heuristics, layout structure patterns, Tailwind class mapping guide

related skills

semantically similar in the cross-vendor index

skills.sh

83% match

screenshot-to-code

Convert UI screenshots into working HTML/CSS/React/Vue code. Detects design patterns, components, and generates responsive layouts. Use this when users provide…

don't have the plugin yet? install it then click "run inline in claude" again.

added explicit inputs section with vision model and external service dependencies, restructured procedure into granular steps with clear in-out contracts, expanded decision points to cover ambiguity, low quality images, copyright issues, scope boundaries, and mobile-responsive intent, and defined output contract with file formats and quality bars plus outcome signals for validation.

AI Image to Code

intent

convert a UI screenshot, wireframe, or design mockup into runnable frontend code (HTML, CSS, React, or framework-agnostic components). use this when a user provides a visual image and asks to "turn this into code", "rebuild this UI", "code this design", "generate HTML from this screenshot", or "write the React component for this". the output is actual working code that approximates the visual layout and styling, not a description or design artifact.

inputs

image source: screenshot, wireframe, mockup, or design file (PNG, JPG, GIF, WebP, or URL)
vision model access: required to analyze image content; any LLM with vision capability (Claude, GPT-4V, etc.)
code generation intent: user must explicitly ask for code (not just a description)
output format preference (optional): HTML/CSS (default), React + Tailwind, Vue, or framework-agnostic JSX
target audience (optional): mobile-first, desktop, responsive, or specific breakpoint

optional external inputs:

design system documentation (if user provides brand guidelines or design tokens)
placeholder image service access (placeholder.com, picsum.photos) for realistic mock images
Tailwind CSS CDN or build toolchain (if React mode selected)

procedure

step 1: receive and validate image

user provides image via paste, file upload, or URL
validate image is readable (not corrupted, not text-only, dimensions >200x200px)
if image is unavailable or corrupted, request user re-upload
note image dimensions, aspect ratio, and apparent device type (mobile, tablet, desktop, print)

output: confirmed image file or URL; metadata (dimensions, device type)

step 2: analyze visual structure

use vision model to inspect image and extract:
- layout grid (header, sidebar, main content, footer, modal overlays)
- color palette (background, primary text, secondary text, accent, borders, hover states)
- typography (identify heading levels, body text, labels; estimate font size hierarchy)
- spacing system (padding, margins, gaps between components; note if grid-based or flexible)
- interactive elements (buttons, inputs, checkboxes, dropdowns, toggles, tabs)
- visual hierarchy (what draws attention, what recedes, use of contrast/size/position)
- any images, icons, or decorative elements
if image is complex (>10 distinct UI sections or multiple nested layouts):
- prioritize main content area over chrome
- note secondary elements but focus code generation on primary flow
flag any ambiguities:
- color is subjective from screenshot (may differ on user's display)
- spacing is approximate from pixel measurements
- exact font family cannot be determined from image alone

output: detailed visual inventory (structured list of sections, colors as hex/rgb, typography scale, spacing notes, component list)

step 3: select code strategy

based on image content and user intent, choose output format:

image content	recommended output	rationale
landing page hero, marketing site	single HTML file with embedded CSS	self-contained, minimal dependencies
dashboard, admin panel, data-heavy UI	HTML + CSS with grid/flexbox layout	card-based, reusable component structure
mobile app screen, small viewport	mobile-first responsive HTML, max-width 375-425px	constraint-driven design
form, login, checkout	semantic HTML form with proper inputs and labels	accessibility, progressive enhancement
card, list item, component preview	component-based JSX or HTML with CSS classes	reusable, composable
chart, graph, data viz	SVG or canvas rendering (if complex)	vector-based, scalable

if user requested React mode, use JSX + Tailwind CSS utility classes. if no preference, default to HTML + CSS.

output: chosen format; justification

step 4: generate code

step 4a: HTML/CSS default mode

write valid HTML5 document structure (DOCTYPE, html, head, body)
include meta tags: charset UTF-8, viewport for responsiveness
extract and embed CSS colors, typography, spacing from step 2 analysis
structure HTML to match layout regions identified in step 2
use semantic tags (header, nav, main, section, article, footer, aside) where appropriate
use meaningful class names (.card, .header-nav, .cta-button, not .box1, .div2)
avoid external dependencies; use CSS Grid or Flexbox for layout
placeholder text: use context-appropriate filler ("Search...", "Enter email", "Learn more", not random lorem)
placeholder images: link to placeholder.com or similar service with alt text
include comments marking major layout sections

example scaffold:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>UI Component</title>
  <style>
    /* extracted colors, typography, spacing */
    :root {
      --color-primary: #...; /* from image */
      --color-bg: #...;
      --font-size-heading: ...;
    }
    /* layout and component styles */
  </style>
</head>
<body>
  <!-- major sections with semantic tags -->
</body>
</html>

step 4b: React + Tailwind mode

write functional React component with JSX syntax
use Tailwind utility classes (no custom CSS); reference Tailwind docs for utility mapping
extract colors from image and map to Tailwind scale (if image color is not Tailwind standard, use closest match and note in comment)
use component composition (break into smaller sub-components if layout is complex)
placeholder data: inline or as prop defaults, not fetched from API
no JavaScript logic, state, or event handlers beyond simple className toggles
export as named export (e.g., export function ProductCard() { ... })

example scaffold:

export function UIComponent() {
  return (
    <div className="min-h-screen bg-gray-50 p-6">
      <header className="bg-white rounded-lg shadow-sm p-4">
        {/* header content */}
      </header>
      <main className="mt-6 grid grid-cols-3 gap-4">
        {/* main content */}
      </main>
    </div>
  );
}

output: runnable code snippet (HTML file or JSX component); code is valid, no unclosed tags, proper nesting

step 5: validate output

check HTML validity:
- all opening tags have closing tags (or are self-closing void elements)
- no nested violations (e.g., block inside inline without wrapper)
- proper use of semantic tags
- alt text on images
check visual alignment:
- all major layout regions from step 2 are present in code (header, sidebar, main, footer if they existed in image)
- color palette is visibly derived from image (colors within ±10% perceptual match; exact match is impossible from screenshot)
- typography hierarchy preserved (h1 > h2 > body size, appropriate weight)
- spacing proportions approximately match image
check content appropriateness:
- no brand names, product names, or proprietary text invented (unless explicitly in image)
- placeholder text is generic but contextually sensible ("Add to Cart", "Search", not fabricated product names)
- no copyrighted logos or icons reproduced; use generic substitutes or SVG symbols
check code quality:
- no external dependencies required (or only CDN-hosted libraries clearly stated)
- code is readable and maintainable (indentation, comments on sections)
- no unused CSS or HTML cruft

if validation fails on any point, regenerate or note limitation to user.

output: validation checklist completed; any deviations flagged

step 6: deliver and set expectations

present code with clear note: "this is an approximation based on the image. colors, fonts, and exact spacing may need fine-tuning."
provide code as:
- inline HTML (if short)
- code block with language tag (html or jsx)
- downloadable file link (if platform supports)
offer next steps: "need to adjust colors?", "want responsive tweaks?", "ready to add interactivity?"

output: user-ready code with caveats stated

decision points

if user provides image but no code format preference:

default to HTML + CSS (single file, self-contained, lowest friction)
note: "output as HTML; reply with /react if you prefer a React component"

if image is ambiguous (colors hard to read, low contrast, poor quality):

state assumption explicitly: "image appears to use a dark theme with blue accents; confirm?"
offer to adjust colors after generation
do not guess or invent

if image contains copyrighted branding, logos, or proprietary UI:

do not reproduce trademarked elements
note: "I've replaced the branded logo with a generic placeholder icon"
suggest user provide their own brand assets

if user requests backend code, API integration, or state management:

decline out of scope: "this skill generates static layout code only. for API calls or business logic, you'll need additional setup"
offer to generate React structure ready for props or context injection

if image is a mobile screenshot and user doesn't specify responsive intent:

assume mobile-first but include CSS for larger viewports
use meta viewport tag and responsive classes (Tailwind breakpoints or CSS media queries)
note: "code is optimized for mobile; scale up for desktop via media queries"

if placeholder.com or image CDN is unreachable or blocked:

use inline SVG placeholder shapes instead
note: "images using SVG placeholders; replace src with real image URLs"

if user requests pixel-perfect fidelity:

clarify: "this is ~80-90% visual approximation. for exact pixel matching, you'll need design tool source files or manual CSS refinement"

if image contains complex interactions (animations, modals, drag-drop):

output only static HTML/CSS layout
note which interactions are visual-only vs. require JavaScript
offer follow-up: "want to add interactivity with JavaScript?"

output contract

format:

HTML mode: valid HTML5 file (single .html file or code block), self-contained CSS in