Local Quickstart - mobile-use SDK

This guide covers local development where you configure LLMs via config files and have full control over the execution environment.

Want a faster setup? Check out the Platform Quickstart - no LLM config files needed and built-in observability!

Make sure you’ve completed the Installation steps before proceeding.

Configure LLM Settings

1. Create LLM Config File

Create a llm-config.override.jsonc file to configure your LLM models. This file will override the default configuration.

llm-config.override.jsonc

// Your custom LLM configuration
{
  "planner": {
    "provider": "openai",
    "model": "gpt-5-nano"
  },
  "orchestrator": {
    "provider": "openai",
    "model": "gpt-5-nano"
  },
  "cortex": {
    "provider": "openai",
    "model": "gpt-5",
    "fallback": {
      "provider": "openai",
      "model": "gpt-5"
    }
  },
  "executor": {
    "provider": "openai",
    "model": "gpt-5-nano"
  },
  "utils": {
    "hopper": {
      // Needs at least a 256k context window
      "provider": "openai",
      "model": "gpt-5-nano"
    },
    "outputter": {
      "provider": "openai",
      "model": "gpt-5-nano"
    }
  }
}

2. Configure Environment Variables

Create a .env file in your project root with necessary API keys:

.env

# LLM API Keys (only include the ones you need)
OPENAI_API_KEY=your_key_here
XAI_API_KEY=your_key_here
OPEN_ROUTER_API_KEY=your_key_here
GOOGLE_API_KEY=your_key_here

# Optional: For local LLMs or custom OpenAI-compatible endpoints
# OPENAI_BASE_URL=http://localhost:1234/v1

Never commit your .env file to version control. Add it to your .gitignore.

Creating Your First Automation

Let’s write a simple script that opens a calculator app and performs a basic calculation.

For more examples, check out the mobile-use SDK examples directory on GitHub.

calculator_demo.py

import asyncio
from minitap.mobile_use.sdk import Agent
from minitap.mobile_use.sdk.types import AgentProfile
from minitap.mobile_use.sdk.builders import Builders

async def main():
    # Create an agent profile
    default_profile = AgentProfile(
        name="default", 
        from_file="llm-config.override.jsonc"
    )
    
    # Configure the agent
    agent_config = Builders.AgentConfig.with_default_profile(default_profile).build()
    agent = Agent(config=agent_config)
    
    try:
        # Initialize the agent (connect to the first available device)
        agent.init()
        
        # Define a simple task goal
        result = await agent.run_task(
            goal="Open the calculator app, calculate 123 * 456, and tell me the result",
            name="calculator_demo"
        )
        
        # Print the result
        print(f"Result: {result}")
        
    except Exception as e:
        print(f"Error: {e}")
    finally:
        # Always clean up when finished
        agent.clean()

if __name__ == "__main__":
    asyncio.run(main())

Run the script

python calculator_demo.py

Initialize the Agent

The agent connects to your device and starts required servers.

Execute the Task

The agent interprets your goal, navigates the UI, and performs the calculation.

Clean Up

Resources are properly released.

Getting Structured Output

Mobile-use SDK can return structured data using Pydantic models:

structured_output.py

import asyncio
from pydantic import BaseModel, Field
from minitap.mobile_use.sdk import Agent
from minitap.mobile_use.sdk.types import AgentProfile
from minitap.mobile_use.sdk.builders import Builders

# Define a model for structured output
class CalculationResult(BaseModel):
    expression: str = Field(..., description="The mathematical expression calculated")
    result: float = Field(..., description="The result of the calculation")
    app_used: str = Field(..., description="The name of the calculator app used")

async def main():
    # Create an agent
    default_profile = AgentProfile(
        name="default", 
        from_file="llm-config.override.jsonc"
    )
    agent_config = Builders.AgentConfig.with_default_profile(default_profile).build()
    agent = Agent(config=agent_config)
    
    try:
        agent.init()
        
        # Request structured output using Pydantic model
        result = await agent.run_task(
            goal="Open the calculator app, calculate 123 * 456, and tell me the result",
            output=CalculationResult,
            name="structured_calculator"
        )
        
        if result:
            print(f"Expression: {result.expression}")
            print(f"Result: {result.result}")
            print(f"App used: {result.app_used}")
        
    finally:
        agent.clean()

if __name__ == "__main__":
    asyncio.run(main())

Using Pydantic models ensures type-safe, validated output from your automation tasks.

Understanding the Code

Agent Profile

default_profile = AgentProfile(
    name="default", 
    from_file="llm-config.override.jsonc"
)

The AgentProfile defines which LLM models power different components of the agent.

Agent Configuration

agent_config = Builders.AgentConfig.with_default_profile(default_profile).build()

The Builders.AgentConfig provides a fluent API to configure your agent.

Running Tasks

result = await agent.run_task(
    goal="Your instruction here",
    output=YourPydanticModel,  # Optional
    name="task_name"  # Optional
)

Tasks are executed asynchronously and can return structured output.

Comparing Local vs Platform

✅ When to Use Local

Full control over LLM providers
Custom infrastructure requirements
Offline or air-gapped environments
Development and testing

🚀 When to Use Platform

Centralized configuration and management
Built-in cost monitoring and observability
Update tasks without code changes

Next Steps

Core Concepts

Understand the architecture and components

Examples

Explore more practical examples

SDK Reference

Detailed SDK documentation

Troubleshooting

Common issues and solutions

Getting Started

Core Concepts

SDK Reference

Examples

Resources

​Configure LLM Settings

​1. Create LLM Config File

​2. Configure Environment Variables

​Creating Your First Automation

​Run the script

​Getting Structured Output

​Understanding the Code

​Agent Profile

​Agent Configuration

​Running Tasks

​Comparing Local vs Platform

✅ When to Use Local

🚀 When to Use Platform

​Next Steps

Core Concepts

Examples

SDK Reference

Troubleshooting

Configure LLM Settings

1. Create LLM Config File

2. Configure Environment Variables

Creating Your First Automation

Run the script

Getting Structured Output

Understanding the Code

Agent Profile

Agent Configuration

Running Tasks

Comparing Local vs Platform

Next Steps