Computer AI Agent

Computer AI Agent Demo Video

Watch the AI agent in action as it autonomously navigates PowerPoint, adds comments to slides, and completes complex tasks using computer vision and automation.

Interactive Trajectory Analysis

Current Task:
Post "put circle shape" comment on slide 3 of the presentation
Step 1: Navigate to Slide 3
🖥️
PowerPoint Presentation Interface
Click slide thumbnail
Step 1 / 6
In Progress
Plan Model
Navigate to Slide 3
The agent identifies the slide navigation panel and locates slide 3 in the presentation. Using computer vision, it detects the thumbnail and prepares to click on it.
Actor Model
Action Type: Click
17% Complete

AI Agent Performance

Built on InternVL2.5-2B—to other open-source 2B-parameter multimodal models, all evaluated on the same 70-task internal benchmark. This comparison focuses on models with a similar parameter count, ensuring a fair assessment of architecture and training strategies rather than raw scale.

Our Approach

  • Action Model (2B): Optimized for action decision-making and UI element recognition using a novel token design
  • Planner Model (2B): LoRA-tuned on top of the Action Model for enhanced planning capabilities
  • Domain-Specific Training: Tailored specifically for PowerPoint-related workflows

Performance Results

Our Approach 24.3%
ShowUI 4.3%
5.7x better than ShowUI (2B) models

Technology Stack

🧠
InternVL2.5 2B
Vision-language model for UI understanding
🔥
PyTorch
Deep learning framework
🧭
VectorDB
Semantic memory and retrieval
🕸️
Graph-Based Planner
Structured reasoning for agent planning
🪶
LoRA
Lightweight fine-tuning for language models
🌐
FastAPI
Backend connection
🖥️
PyQt
GUI framework for Python applications