Automate macOS GUI via screen capture and OCR text reading, mouse and keyboard control, window management, and app launching using cliclick, screencapture, t...
# macOS GUI Automation Skill ## Capabilities - **Screen Reading**: Capture screenshots and OCR text - **Mouse Control**: Click, double-click, right-click, move, drag - **Keyboard Input**: Type text, press keys, shortcuts - **Window Management**: List windows, focus, resize, close - **App Control**: Launch, quit, bring to front ## Tools Available ### cliclick (Mouse/Keyboard) ```bash # Click at coordinates cliclick c:x,y # Double click cliclick dc:x,y # Right click cliclick rc:x,y # Move mouse cliclick m:x,y # Drag from to cliclick dr:x1,y1:x2,y2 # Type text cliclick t:"text" # Press key (Enter, Tab, etc.) cliclick kp:enter ``` ### screencapture + tesseract (Screen Reading) ```bash # Capture region to file screencapture -R x,y,w,h /tmp/screen.png # Capture full screen screencapture /tmp/screen.png # OCR from image tesseract /tmp/screen.png stdout # OCR with Chinese support tesseract /tmp/screen.png stdout -l chi_sim+eng ``` ### osascript (AppleScript - Window/App Control) ```bash # List all windows osascript -e 'tell application "System Events" to get name of every process' # Get window position/size osascript -e 'tell application "Finder" to get bounds of window of front window' # Click menu item osascript -e 'tell application "System Events" to click menu item "Save" of menu "File" of process "TextEdit"' ``` ## Usage Patterns ### Read Screen Text ```bash # 1. Capture screen screencapture -R 100,100,800,600 /tmp/region.png # 2. OCR tesseract /tmp/region.png stdout ``` ### Click Button at Position ```bash cliclick c:500,300 ``` ### Type in Field ```bash # Click field first cliclick c:400,200 # Then type cliclick t:"hello world" cliclick kp:enter ``` ### Find and Click (OCR + Click) ```bash # 1. Capture and OCR screencapture /tmp/screen.png text=$(tesseract /tmp/screen.png stdout) # 2. Parse coordinates from OCR result or use image recognition # 3. Click cliclick c:x,y ``` ## Limitations - Coordinates are absolute (screen resolution dependent) - No built-in image recognition (need to add OpenCV/sikuli for that) - OCR accuracy depends on screen DPI and font - Some apps may not be scriptable via AppleScript ## Security Notes - Requires Accessibility permissions in System Settings - Run `tccutil reset Accessibility` if permissions issues - Some apps (browsers, secure apps) may block automation
don't have the plugin yet? install it then click "run inline in claude" again.