Skip to main content
The mobile-use SDK follows a layered architecture designed to provide both simplicity for common use cases and flexibility for advanced scenarios.
The core concepts apply to both Platform and Local approaches. The platform simplifies configuration by managing profiles and tasks centrally, while local development gives you full control over all components.

Architecture Diagram

Key Components

Component Overview

Agent Layer

The Agent class is the primary entry point that coordinates:
  • Device connections (Android/iOS)
  • Server lifecycle management
  • Task creation and execution
  • Resource cleanup

Task Layer

Tasks represent automation workflows defined by:
  • Natural language goals - What you want to accomplish
  • Structured output - Type-safe results using Pydantic
  • Tracing - Recording execution for debugging

LangGraph Integration

The SDK leverages LangGraph for:
  • Agent reasoning - Transparent decision-making process
  • Step-by-step execution - Breaking complex tasks into manageable steps
  • Dynamic adaptation - Responding to what’s on screen

Device Interaction

Two key components handle device control:
Performs physical actions on the device:
  • Tap, swipe, scroll gestures
  • App launching and navigation
  • Key press events
  • Text input
Captures device state:
  • Screenshots for visual analysis
  • UI hierarchy data
  • Element accessibility information

Execution Flow

1

Initialize

Agent connects to device and starts required servers
2

Plan

LLM analyzes the goal and creates a plan
3

Observe

Screen API captures current UI state
4

Decide

LLM determines next action based on screen
5

Act

Hardware Bridge executes the action
6

Repeat

Loop through steps 3-5 until goal is achieved

Next Steps

⌘I