The core concepts apply to both Platform and Local approaches. The platform simplifies configuration by managing profiles and tasks centrally, while local development gives you full control over all components.
Architecture Diagram
Key Components
Agent
The central orchestrator for mobile automation
Tasks
Goal-based automation workflows with structured output
Profiles
Customize agent behavior and LLM configuration
Builders
Fluent APIs for configuring agents and tasks
Component Overview
Agent Layer
TheAgent class is the primary entry point that coordinates:
- Device connections (Android/iOS)
- Server lifecycle management
- Task creation and execution
- Resource cleanup
Task Layer
Tasks represent automation workflows defined by:- Natural language goals - What you want to accomplish
- Structured output - Type-safe results using Pydantic
- Tracing - Recording execution for debugging
LangGraph Integration
The SDK leverages LangGraph for:- Agent reasoning - Transparent decision-making process
- Step-by-step execution - Breaking complex tasks into manageable steps
- Dynamic adaptation - Responding to whatβs on screen
Device Interaction
Two key components handle device control:Hardware Bridge (Maestro)
Hardware Bridge (Maestro)
Performs physical actions on the device:
- Tap, swipe, scroll gestures
- App launching and navigation
- Key press events
- Text input
Screen API
Screen API
Captures device state:
- Screenshots for visual analysis
- UI hierarchy data
- Element accessibility information
Execution Flow
1
Initialize
Agent connects to device and starts required servers
2
Plan
LLM analyzes the goal and creates a plan
3
Observe
Screen API captures current UI state
4
Decide
LLM determines next action based on screen
5
Act
Hardware Bridge executes the action
6
Repeat
Loop through steps 3-5 until goal is achieved