Skip to main content
πŸš€ Recommended approach for most users - Get started faster with centralized configuration and built-in observability.
The Platform approach uses platform.minitap.ai to manage your automation tasks and LLM configurations in the cloud.

Why Use the Platform?

⚑ Faster Setup

No LLM config files needed - Start in minutes, not hours

πŸ“Š Real-Time Monitoring

Track costs, execution time, and agent reasoning live

πŸ”„ Dynamic Updates

Update task prompts and LLM models without code changes

πŸ‘₯ Team Collaboration

Centralized tasks and profiles for your organization

πŸ€– All OpenRouter Models

Access to all models available on OpenRouter - no individual API key management needed
Compare with Local Development if you need full control over LLM configuration or offline capability.

Prerequisites & Installation

First time? Complete the common Installation steps (SDK installation, device setup, etc.) before continuing with platform-specific configuration below.

Configure Platform Credentials

Create a .env file in your project root with your Minitap Platform credentials:
.env
# Minitap Platform API Key (get this from platform.minitap.ai)
MINITAP_API_KEY=your_api_key_here

# Minitap Platform Base URL (optional - this is the default)
MINITAP_BASE_URL=https://platform.minitap.ai/api/v1
Never commit your .env file to version control. Add it to your .gitignore.
No LLM config file needed! Unlike local development, the platform manages all LLM configurations centrally.

Quick Start

1

Sign Up

Go to platform.minitap.ai and create an account
2

Create API Key

Navigate to API Keys β†’ Create API Key β†’ Copy your key and add it to your .env file as shown above
3

Create a Task

Go to Tasks β†’ Create Task
  • Name: check-notifications
  • Agent Prompt: Open the notifications panel and list all notifications
  • Click Create
4

Run from SDK

Create your first automation script:
import asyncio
from minitap.mobile_use.sdk import Agent
from minitap.mobile_use.sdk.types import PlatformTaskRequest

async def main():
    agent = Agent()
    agent.init()
    
    result = await agent.run_task(
        request=PlatformTaskRequest(task="check-notifications")
    )
    
    print(f"Result: {result}")
    agent.clean()

asyncio.run(main())
5

View Results

Go to Task Runs to see execution details, agent thoughts, and costs

What’s Next?

Create Custom Tasks

Define more complex automation workflows with structured outputs

Optimize LLM Models

Create custom LLM profiles for cost vs. performance tradeoffs

View Observability

Explore agent thoughts, execution timeline, and cost breakdown

Collaborate with Team

Centralized tasks and profiles for your organization

Learn More

Task Configuration Options

When creating tasks on the platform, you have several configuration options: Basic Fields:
  • Task Name: Unique identifier used in your SDK code
  • Description: Helps team members understand the task purpose
  • Agent Prompt: Detailed instructions for the agent (use the β€œGenerate” button for AI assistance)
  • Output Description: Optional - describe the expected JSON structure for structured outputs
Settings:
  • Enable Tracing: Shows full LLM prompts/responses on platform (disable for privacy)
  • Max Steps: Limit execution steps to prevent runaway costs (default: 400)

LLM Profiles (Optional)

By default, tasks use a Minitap-managed profile optimized for mobile-use. Create custom profiles for:
  • Cost optimization (use faster/cheaper models)
  • Performance optimization (use more powerful models)
  • Different task types (simple vs. complex)
LLM Profile configuration
All models use the minitap provider with format: provider/model-name (e.g., openai/gpt-5, google/gemini-2.5-pro)
The platform supports all models available on OpenRouter, giving you access to the latest models from OpenAI, Anthropic, Google, Meta, and more - without managing individual API keys.
Agent Components: The mobile-use agent uses a multi-agent architecture where different LLMs handle specific tasks:
Role: The β€œeyes” and decision-maker of the system. Analyzes screenshots, understands UI elements, and decides what action to take next.Requirements: Must support vision/image inputsRecommendation: Use the best vision model available:
  • google/gemini-2.5-pro - Excellent vision + reasoning
  • openai/gpt-5 - Strong vision capabilities
  • anthropic/claude-3.5-sonnet - Good vision understanding
Also configure a fallback for reliability when primary model fails.Impact: πŸ”΄ Critical - Poor cortex model = task failures
Role: Decomposes high-level goals into executable subgoals. Runs once at the start and potentially during replanning.Requirements: Strong reasoning and planning capabilitiesRecommendation:
  • meta-llama/llama-4-scout - Fast and capable
  • openai/gpt-5-nano - Quick planning
  • anthropic/claude-3-haiku - Cost-effective
Impact: 🟑 Medium - Affects execution strategy
Role: Coordinates the execution flow, decides when to use hopper vs cortex, manages state transitions.Requirements: Fast, good at decision-makingRecommendation: Fast models:
  • openai/gpt-oss-120b - Efficient coordination
  • openai/gpt-5-nano - Quick decisions
Impact: 🟑 Medium - Affects execution efficiency
Role: Translates high-level decisions into specific device actions (tap, swipe, type).Requirements: Instruction-following, fast responseRecommendation:
  • meta-llama/llama-3.3-70b-instruct - Excellent instruction following
  • openai/gpt-5-nano - Fast execution
Impact: 🟒 Low - Straightforward task
Role: Digs through large batches of data (historical context, screen data) to extract the most relevant information for reaching the goal.Requirements: Large context window (256k+ tokens recommended) to handle extensive data batchesRecommendation:
  • openai/gpt-4.1 - 256k context
  • google/gemini-2.0-flash - Large context
Impact: 🟑 Medium - Improves information extraction from large datasets
Role: Extracts structured output from task results according to output description.Requirements: JSON formatting, structured output capabilityRecommendation:
  • openai/gpt-5-nano - Good at JSON
  • anthropic/claude-3-haiku - Structured outputs
Impact: 🟒 Low - Only used when output_description specified

Structured Output Example

For type-safe results, use Pydantic models:
from pydantic import BaseModel, Field

class NotificationSummary(BaseModel):
    total: int = Field(..., description="Total notifications")
    unread: int = Field(..., description="Unread count")

result = await agent.run_task(
    request=PlatformTaskRequest[NotificationSummary](
        task="check-notifications",
        profile="default"
    )
)

# result is typed as NotificationSummary | None
if result:
    print(f"Total: {result.total}, Unread: {result.unread}")

Viewing Task Runs

Visit Task Runs to see execution details:
Task runs list view
Click any run to view:
  • Execution status and duration
  • Agent thoughts and reasoning
  • Subgoal progression
  • Cost breakdown
Task run details view
What Gets Tracked:
Status transitions throughout execution:
  • pending: Task created, waiting to start
  • running: Task actively executing
  • completed: Task finished successfully with output
  • failed: Task encountered an error
  • cancelled: Task was manually cancelled
The planner agent creates high-level subgoals. Each subgoal is tracked:
  • Name/description
  • State: pending β†’ started β†’ completed/failed
  • Start and end timestamps
  • Plan updates on replanning
Reasoning from each agent component:
  • Planner: Goal decomposition and planning
  • Cortex: Visual understanding and decision making
  • Orchestrator: Execution coordination
  • Executor: Action translation and execution
  • Hopper: Data extraction from large batches
  • Outputter: Structured output extraction
Each thought includes timestamp and agent identifier.
Detailed LLM API call metrics (when tracing enabled):
  • Model used
  • Token counts (input/output)
  • Cost in dollars
  • Latency
  • Request/response content

PlatformTaskRequest Reference

Parameters:
  • task (required): Task name from platform
  • profile (optional): LLM profile name (defaults to Minitap-managed profile)
  • api_key (optional): Overrides MINITAP_API_KEY environment variable
  • record_trace (optional): Save local trace files
  • trace_path (optional): Local directory for traces

Platform vs Local Comparison

Use the Local approach if you need:
  • Full control over LLM provider selection and API endpoints
  • Custom infrastructure or air-gapped environments
  • Offline capability without internet dependency
  • Development and testing with local model configurations

Resources

⌘I