Skip to main content
The mobile-use SDK follows a layered architecture designed to provide both simplicity for common use cases and flexibility for advanced scenarios.
The core concepts apply to both Platform and Local approaches. The platform simplifies configuration by managing profiles and tasks centrally, while local development gives you full control over all components.

Architecture Diagram

Key Components

Component Overview

Agent Layer

The Agent class is the primary entry point that coordinates:
  • Device connections (Android/iOS)
  • Server lifecycle management
  • Task creation and execution
  • Resource cleanup

Task Layer

Tasks represent automation workflows defined by:
  • Natural language goals - What you want to accomplish
  • Structured output - Type-safe results using Pydantic
  • Tracing - Recording execution for debugging

LangGraph Integration

The SDK leverages LangGraph for:
  • Agent reasoning - Transparent decision-making process
  • Step-by-step execution - Breaking complex tasks into manageable steps
  • Dynamic adaptation - Responding to what’s on screen

Device Interaction

Two key components handle device control:
Performs physical actions on the device using native platform tools:
  • Android: Uses ADB (Android Debug Bridge) with UIAutomator2 for reliable UI automation
  • iOS: Uses IDB (iOS Development Bridge) for simulator and device control
Capabilities include:
  • Tap, swipe, scroll gestures
  • App launching and navigation
  • Key press events
  • Text input

Execution Flow

1

Initialize

Agent connects to device and starts required servers
2

Plan

LLM analyzes the goal and creates a plan
3

Observe

Device Controller captures current UI state (screenshots and UI hierarchy)
4

Decide

LLM determines next action based on screen
5

Act

Device Controller executes the action
6

Repeat

Loop through steps 3-5 until goal is achieved

Next Steps