πŸš€ LocateAnything

Locate any object in images or videos with natural language.
Upload an image/video on the left, choose a task type, enter what you want to find, then click Run Inference. Results with bounding boxes will appear on the right.

Quick Start: β‘  Select Image or Video β†’ β‘‘ Pick a Task Type (Detection / Grounding / OCR / GUI / Pointing) β†’ β‘’ Type your Categories (comma-separated) β†’ β‘£ Click 🧠 Run Inference

βš™οΈ Settings

1. Input Media Type

Select whether to process a single image or a video clip.

2. Task Type

Detection: find all instances | Grounding: match description | OCR: extract text | GUI: locate UI element | Pointing: point to target

4. Inference Mode

fast: MTP parallel decoding | slow: standard AR decoding | hybrid: auto-switch for best quality-speed balance

πŸ“₯ Input Media

πŸ“€ Output Result

πŸ“ Raw Input Prompt

πŸ” Decoding Visualization


πŸ–ΌοΈ Examples

Click any example below to auto-fill the settings and input image.