Feed-forward 3D foundation model for streaming scene reconstruction using Geometric Context Transformer
LingBot-Map 3D Reconstruction Skill Skill by ara.so — Daily 2026 Skills collection. LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from streaming image or video data using a Geometric Context Transformer. It achieves ~20 FPS on 518×378 resolution over sequences exceeding 10,000 frames via paged KV cache attention. What It Does Streaming 3D reconstruction from image sequences or video Feed-forward inference (no iterative optimization needed) Outputs: point clouds with per-point confidence, camera poses, depth maps Key features: anchor context, pose-reference window, trajectory memory for drift correction Installation # 1. Create environment conda create -n lingbot-map python=3.10 -y conda activate lingbot-map
don't have the plugin yet? install it then click "run inline in claude" again.