Platform Quickstart

🚀 Recommended approach for most users - Get started faster with centralized configuration and built-in observability.

The Platform approach uses platform.minitap.ai to manage your automation tasks and LLM configurations in the cloud.

Why Use the Platform?

⚡ Faster Setup

No LLM config files needed - Start in minutes, not hours

📊 Real-Time Monitoring

Track costs, execution time, and agent reasoning live

🔄 Dynamic Updates

Update task prompts and LLM models without code changes

👥 Team Collaboration

Centralized tasks and profiles for your organization

🤖 All OpenRouter Models

Access to all models available on OpenRouter - no individual API key management needed

Compare with Local Development if you need full control over LLM configuration or offline capability.

Prerequisites & Installation

First time? Complete the common Installation steps (SDK installation, device setup, etc.) before continuing with platform-specific configuration below.

Configure Platform Credentials

Create a .env file in your project root with your Minitap Platform credentials:

.env

# Minitap Platform API Key (get this from platform.minitap.ai)
MINITAP_API_KEY=your_api_key_here

# Minitap Platform Base URL (optional - this is the default)
MINITAP_BASE_URL=https://platform.minitap.ai/api/v1

Never commit your .env file to version control. Add it to your .gitignore.

No LLM config file needed! Unlike local development, the platform manages all LLM configurations centrally.

Quick Start

Go to platform.minitap.ai and create an account

Create API Key

Navigate to API Keys → Create API Key → Copy your key and add it to your .env file as shown above

Create a Task

Go to Tasks → Create Task

Name: check-notifications
Agent Prompt: Open the notifications panel and list all notifications
Click Create

Run from SDK

Create your first automation script:

import asyncio
from minitap.mobile_use.sdk import Agent
from minitap.mobile_use.sdk.types import PlatformTaskRequest

async def main():
    agent = Agent()
    agent.init()
    
    result = await agent.run_task(
        request=PlatformTaskRequest(task="check-notifications")
    )
    
    print(f"Result: {result}")
    agent.clean()

asyncio.run(main())

View Results

Go to Task Runs to see execution details, agent thoughts, and costs

What’s Next?

Create Custom Tasks

Define more complex automation workflows with structured outputs

Optimize LLM Models

Create custom LLM profiles for cost vs. performance tradeoffs

View Observability

Explore agent thoughts, execution timeline, and cost breakdown

Collaborate with Team

Centralized tasks and profiles for your organization

Learn More

Task Configuration Options

When creating tasks on the platform, you have several configuration options: Basic Fields:

Task Name: Unique identifier used in your SDK code
Description: Helps team members understand the task purpose
Agent Prompt: Detailed instructions for the agent (use the “Generate” button for AI assistance)
Output Description: Optional - describe the expected JSON structure for structured outputs

Settings:

Enable Tracing: Shows full LLM prompts/responses on platform (disable for privacy)
Max Steps: Limit execution steps to prevent runaway costs (default: 400)

LLM Profiles (Optional)

By default, tasks use a Minitap-managed profile optimized for mobile-use. Create custom profiles for:

Cost optimization (use faster/cheaper models)
Performance optimization (use more powerful models)
Different task types (simple vs. complex)

All models use the minitap provider with format: provider/model-name (e.g., openai/gpt-5, google/gemini-2.5-pro)

The platform supports all models available on OpenRouter, giving you access to the latest models from OpenAI, Anthropic, Google, Meta, and more - without managing individual API keys.

Agent Components: The mobile-use agent uses a multi-agent architecture where different LLMs handle specific tasks:

Cortex (Most Critical - Vision Required)

Role: The “eyes” and decision-maker of the system. Analyzes screenshots, understands UI elements, and decides what action to take next.Requirements: Must support vision/image inputsRecommendation: Use the best vision model available:

google/gemini-2.5-pro - Excellent vision + reasoning
openai/gpt-5 - Strong vision capabilities
anthropic/claude-3.5-sonnet - Good vision understanding

Also configure a fallback for reliability when primary model fails.Impact: 🔴 Critical - Poor cortex model = task failures

Planner

Role: Decomposes high-level goals into executable subgoals. Runs once at the start and potentially during replanning.Requirements: Strong reasoning and planning capabilitiesRecommendation:

meta-llama/llama-4-scout - Fast and capable
openai/gpt-5-nano - Quick planning
anthropic/claude-3-haiku - Cost-effective

Impact: 🟡 Medium - Affects execution strategy

Orchestrator

Role: Coordinates the execution flow, decides when to use hopper vs cortex, manages state transitions.Requirements: Fast, good at decision-makingRecommendation: Fast models:

openai/gpt-oss-120b - Efficient coordination
openai/gpt-5-nano - Quick decisions

Impact: 🟡 Medium - Affects execution efficiency

Executor

Role: Translates high-level decisions into specific device actions (tap, swipe, type).Requirements: Instruction-following, fast responseRecommendation:

meta-llama/llama-3.3-70b-instruct - Excellent instruction following
openai/gpt-5-nano - Fast execution

Impact: 🟢 Low - Straightforward task

Hopper

Role: Digs through large batches of data (historical context, screen data) to extract the most relevant information for reaching the goal.Requirements: Large context window (256k+ tokens recommended) to handle extensive data batchesRecommendation:

openai/gpt-4.1 - 256k context
google/gemini-2.0-flash - Large context

Impact: 🟡 Medium - Improves information extraction from large datasets

Outputter

Role: Extracts structured output from task results according to output description.Requirements: JSON formatting, structured output capabilityRecommendation:

openai/gpt-5-nano - Good at JSON
anthropic/claude-3-haiku - Structured outputs

Impact: 🟢 Low - Only used when output_description specified

Structured Output Example

For type-safe results, use Pydantic models:

from pydantic import BaseModel, Field

class NotificationSummary(BaseModel):
    total: int = Field(..., description="Total notifications")
    unread: int = Field(..., description="Unread count")

result = await agent.run_task(
    request=PlatformTaskRequest[NotificationSummary](
        task="check-notifications",
        profile="default"
    )
)

# result is typed as NotificationSummary | None
if result:
    print(f"Total: {result.total}, Unread: {result.unread}")

Viewing Task Runs

Visit Task Runs to see execution details:

Click any run to view:

Execution status and duration
Agent thoughts and reasoning
Subgoal progression
Cost breakdown

What Gets Tracked:

Task Run Status

Status transitions throughout execution:

pending: Task created, waiting to start
running: Task actively executing
completed: Task finished successfully with output
failed: Task encountered an error
cancelled: Task was manually cancelled

Subgoals & Plans

The planner agent creates high-level subgoals. Each subgoal is tracked:

Name/description
State: pending → started → completed/failed
Start and end timestamps
Plan updates on replanning

Agent Thoughts

Reasoning from each agent component:

Planner: Goal decomposition and planning
Cortex: Visual understanding and decision making
Orchestrator: Execution coordination
Executor: Action translation and execution
Hopper: Data extraction from large batches
Outputter: Structured output extraction

Each thought includes timestamp and agent identifier.

LLM Traces

Detailed LLM API call metrics (when tracing enabled):

Model used
Token counts (input/output)
Cost in dollars
Latency
Request/response content

PlatformTaskRequest Reference

Parameters:

task (required): Task name from platform
profile (optional): LLM profile name (defaults to Minitap-managed profile)
api_key (optional): Overrides MINITAP_API_KEY environment variable
record_trace (optional): Save local trace files
trace_path (optional): Local directory for traces

Platform vs Local Comparison

Platform Benefits (Recommended)

✅ No LLM config files - Platform manages all model configurations✅ All OpenRouter models - Access to all models on OpenRouter without managing API keys✅ Instant updates - Change task prompts and models without redeploying code✅ Built-in observability - Real-time cost tracking, execution monitoring, and agent reasoning✅ Team collaboration - Share tasks and LLM profiles across your organization

When to Use Local Instead

Use the Local approach if you need:

Full control over LLM provider selection and API endpoints
Custom infrastructure or air-gapped environments
Offline capability without internet dependency
Development and testing with local model configurations

Resources

Local Quickstart

Learn the local approach for comparison

Types Reference

PlatformTaskRequest type documentation

Dashboard

Go to Minitap Platform

Examples

More platform examples

Getting Started

Core Concepts

SDK Reference

Examples

Resources

Why Use the Platform?

⚡ Faster Setup

📊 Real-Time Monitoring

🔄 Dynamic Updates

👥 Team Collaboration

🤖 All OpenRouter Models

Prerequisites & Installation

Configure Platform Credentials

Quick Start

What’s Next?

Create Custom Tasks

Optimize LLM Models

View Observability

Collaborate with Team

Learn More

Task Configuration Options

LLM Profiles (Optional)

Structured Output Example

Viewing Task Runs

PlatformTaskRequest Reference

Platform vs Local Comparison

Resources

Local Quickstart

Types Reference

Dashboard

Examples

Getting Started

Core Concepts

SDK Reference

Examples

Resources

​Why Use the Platform?

⚡ Faster Setup

📊 Real-Time Monitoring

🔄 Dynamic Updates

👥 Team Collaboration

🤖 All OpenRouter Models

​Prerequisites & Installation

​Configure Platform Credentials

​Quick Start

​What’s Next?

Create Custom Tasks

Optimize LLM Models

View Observability

Collaborate with Team

​Learn More

​Task Configuration Options

​LLM Profiles (Optional)

​Structured Output Example

​Viewing Task Runs

​PlatformTaskRequest Reference

​Platform vs Local Comparison

​Resources

Local Quickstart

Types Reference

Dashboard

Examples

Why Use the Platform?

Prerequisites & Installation

Configure Platform Credentials

Quick Start

What’s Next?

Learn More

Task Configuration Options

LLM Profiles (Optional)

Structured Output Example

Viewing Task Runs

PlatformTaskRequest Reference

Platform vs Local Comparison

Resources