Computer AI Agent
Computer AI Agent Demo Video
Watch the AI agent in action as it autonomously navigates PowerPoint, adds comments to slides, and completes complex tasks using computer vision and automation.
Interactive Trajectory Analysis
Current Task:
Post "put circle shape" comment on slide 3 of
the presentation
Step 1: Navigate to Slide 3
PowerPoint Presentation Interface
Step 1 / 6
In Progress
Plan Model
Navigate to Slide 3
The agent identifies the slide navigation panel and locates slide 3 in the presentation.
Using computer vision, it detects the thumbnail and prepares to click on it.
Actor Model
Action Type:
Click
17% Complete
AI Agent Performance
Built on InternVL2.5-2B—to other open-source 2B-parameter multimodal models, all evaluated on the same 70-task internal benchmark. This comparison focuses on models with a similar parameter count, ensuring a fair assessment of architecture and training strategies rather than raw scale.
Our Approach
- Action Model (2B): Optimized for action decision-making and UI element recognition using a novel token design
- Planner Model (2B): LoRA-tuned on top of the Action Model for enhanced planning capabilities
- Domain-Specific Training: Tailored specifically for PowerPoint-related workflows
Performance Results
Our Approach
24.3%
ShowUI
4.3%
5.7x better than ShowUI (2B) models
Technology Stack
InternVL2.5 2B
Vision-language model for UI understanding
PyTorch
Deep learning framework
VectorDB
Semantic memory and retrieval
Graph-Based Planner
Structured reasoning for agent planning
LoRA
Lightweight fine-tuning for language models
FastAPI
Backend connection
PyQt
GUI framework for Python applications