What This Example Does
Prerequisites
Install ffmpeg
- macOS
- Linux
- Windows
Supported Video Analyzer Models
Thevideo_analyzer utility requires a video-capable Gemini model:
| Model | Provider | Notes |
|---|---|---|
gemini-3-flash-preview | Recommended - Fast and capable | |
gemini-3-pro-preview | Higher quality, slower | |
gemini-2.5-flash | Good balance | |
gemini-2.5-pro | Premium quality | |
gemini-2.0-flash | Fast, reliable |
Video Recording Tools
When video recording is enabled, the agent has access to two tools:start_video_recording
Starts a background screen recording on the mobile device.- Recording continues until
stop_video_recordingis called - No duration limit - recording runs as long as needed
- Audio: Not captured (video only)
On Android, the native
screenrecord command has a 3-minute limit, but mobile-use automatically handles this by segmenting and concatenating recordings seamlessly. You don’t need to worry about this limit.stop_video_recording
Stops the current screen recording and analyzes the video content.Specifies what to extract from the video. Examples:
"Describe what actions are shown on screen""What happens after each 10 seconds of the video?""List all UI elements and buttons that appear"
Complete Code
video_transcription_example.py
Code Breakdown
1. Configure video_analyzer in LLMConfigUtils
The key configuration is addingvideo_analyzer to your LLMConfigUtils:
The
video_analyzer is optional in LLMConfigUtils. It’s only required when using video recording tools.2. Enable Video Recording Tools
Use the builder method to enable video tools:3. Use Recording in Task Goals
The agent can now use recording tools in natural language goals:Configuration File Approach
You can also configurevideo_analyzer in a JSONC config file:
llm-config.video.jsonc
How It Works
- Android
- iOS
Android’s native
screenrecord command has a 3-minute hard limit. To work around this, mobile-use:- Automatically restarts recording before each 3-minute segment ends
- Saves each segment locally
- Concatenates all segments using ffmpeg when you stop recording
Use Cases
- Video Analysis
- UI Workflow Capture
- App Testing
Custom Analysis Prompts
Thestop_video_recording tool accepts a prompt parameter for custom analysis:
Troubleshooting
FFmpegNotInstalledError
FFmpegNotInstalledError
Error:
ffmpeg is required for video recording but is not installedSolution: Install ffmpeg using your system’s package manager:- macOS:
brew install ffmpeg - Linux:
apt install ffmpegordnf install ffmpeg - Windows: Download from ffmpeg.org
video_analyzer not configured
video_analyzer not configured
Error:
with_video_recording_tools() requires 'video_analyzer' in utilsSolution: Add video_analyzer to your profile’s LLMConfigUtils:Segment concatenation warnings
Segment concatenation warnings
Warning:
Concatenation failed, using last segment onlyCause: ffmpeg failed to merge video segments (Android only).Solution: Ensure ffmpeg is properly installed and working. The recording will still succeed but may only contain the last 3-minute segment.Video analysis fails
Video analysis fails
Error:
Recording stopped but analysis failedPossible causes:- Video file too large (>14MB after compression)
- Gemini API rate limits
- Invalid video format
Best Practices
Keep Recordings Short
Shorter recordings (under 2 minutes) process faster and more reliably
Use Specific Prompts
Tell the agent exactly what to extract: “list all buttons shown” vs “describe the workflow”
Configure Fallbacks
Always set a fallback model for video_analyzer in case of API issues
Test ffmpeg First
Verify ffmpeg works before running:
ffmpeg -version
