A Python-based mobile automation agent that uses Cloud Vision-Language APIs (Kimi, GPT-4V, Claude, etc.) to understand and interact with Android devices through visual analysis and ADB commands.
No GPU required! This fork replaces the original local Qwen3-VL model with API-based vision models.
English | 简体中文
- ☁️ Cloud Vision Models: Use Kimi K2.5, GPT-4V, Claude 3.5 Sonnet, or other VLM APIs
- 🤖 ADB Integration: Controls Android devices via ADB commands
- 📝 Natural language tasks: Describe what you want in plain English
- 🌐 Web UI: Built-in Gradio interface for easy control
- 📱 Real-time feedback: Live screenshots and execution logs
- 🔌 Multi-Provider Support: Kimi Code, OpenRouter, Moonshot, OpenAI, and more
- Python 3.10+
- Android device with USB debugging & Developer Mode enabled
- ADB (Android Debug Bridge) installed
- API key from supported providers (Kimi Code, OpenAI, OpenRouter, etc.)
Windows:
# Download from https://developer.android.com/studio/releases/platform-tools
# Add to PATHLinux/Ubuntu:
sudo apt update
sudo apt install adbmacOS:
brew install android-platform-toolsgit clone https://github.com/yourusername/PhoneDriver-API.git
cd PhoneDriver-API
# Create virtual environment
python -m venv venv
# Windows
venv\Scripts\activate
# Linux/macOS
source venv/bin/activate
# Install dependencies
pip install -r requirements.txtCopy the example config and edit:
cp .env.example .envEdit .env with your preferred provider:
Option A: Kimi Code (Recommended for China)
PROVIDER=kimi_code
KIMI_CODE_API_KEY=sk-kimi-xxxxxOption B: OpenRouter (Supports multiple models)
PROVIDER=openrouter
OPENROUTER_API_KEY=sk-or-v1-xxxxx
MODEL=moonshotai/kimi-k2.5Option C: OpenAI
PROVIDER=openai
OPENAI_API_KEY=sk-xxxxx
MODEL=gpt-4oOption D: Moonshot AI
PROVIDER=moonshot
MOONSHOT_API_KEY=sk-xxxxx
MODEL=kimi-k2.5Enable USB debugging on your Android device:
- Settings → About Phone → Tap "Build Number" 7 times
- Settings → Developer Options → Enable "USB Debugging"
- Connect via USB and allow debugging
Verify connection:
adb devicesCommand Line:
python phone_agent.py "Open Settings"Web UI:
python ui.py
# Navigate to http://localhost:7860PhoneDriver-API/
├── phone_agent.py # Main CLI agent
├── ui.py # Gradio web interface
├── config.json # Device configuration
├── .env # API keys (create from .env.example)
├── requirements.txt # Python dependencies
├── README.md # This file
├── LICENSE # MIT License
├── providers/ # API provider implementations
│ ├── __init__.py
│ ├── base.py # Base provider interface
│ ├── kimi_code.py # Kimi Code API
│ ├── openrouter.py # OpenRouter API
│ ├── openai_provider.py # OpenAI API
│ └── moonshot.py # Moonshot AI API
└── utils/ # Utility functions
├── __init__.py
├── adb.py # ADB wrapper
└── screenshot.py # Screenshot capture
The agent auto-detects your device resolution. To verify:
adb shell wm size| Provider | Model | Vision | Notes |
|---|---|---|---|
| Kimi Code | kimi-for-coding, kimi-k2.5 | ✅ | Best for coding tasks |
| OpenRouter | moonshotai/kimi-k2.5, anthropic/claude-3.5-sonnet, etc. | ✅ | Multiple models |
| OpenAI | gpt-4o, gpt-4o-mini | ✅ | Reliable, higher cost |
| Moonshot | kimi-k2.5, kimi-vl | ✅ | Official Moonshot API |
| Variable | Description | Required |
|---|---|---|
PROVIDER |
API provider (kimi_code, openrouter, openai, moonshot) |
Yes |
KIMI_CODE_API_KEY |
Kimi Code API key | If using Kimi Code |
OPENROUTER_API_KEY |
OpenRouter API key | If using OpenRouter |
OPENAI_API_KEY |
OpenAI API key | If using OpenAI |
MOONSHOT_API_KEY |
Moonshot API key | If using Moonshot |
MODEL |
Model name (provider-specific) | Optional |
# Open an app
python phone_agent.py "Open Chrome"
# Perform a search
python phone_agent.py "Search for weather in New York"
# Change settings
python phone_agent.py "Open Settings and enable WiFi"
# Take a photo
python phone_agent.py "Open camera and take a photo"from phone_agent import PhoneAgent
config = {
"provider": "kimi_code",
"api_key": "your-api-key",
"screen_width": 1080,
"screen_height": 2340
}
agent = PhoneAgent(config)
result = agent.execute_task("Open Settings")
print(result)# Restart ADB server
adb kill-server
adb start-server
adb devicesAuto-detect resolution in UI Settings tab, or manually verify:
adb shell wm size- Verify your API key is valid
- Check if you have sufficient quota/credits
- Ensure
PROVIDERmatches the API key type
If you see UnicodeEncodeError in logs, run PowerShell as UTF-8:
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
python phone_agent.py "your task"
- Yesssssbabe - Creator & Maintainer (@Yesssssbabe)
Have questions or suggestions? Feel free to reach out!
- WeChat: Scan the QR code below (add note: phonedriverapi)
- GitHub Issues: Create an issue
Note: Please add
phonedriverapiwhen sending friend request.
- @Yesssssbabe - Creator & Maintainer of PhoneDriver-API
- @OminousIndustries - Original PhoneDriver author
- Kimi by Moonshot AI
- OpenRouter for unified API access
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Support for more providers (Anthropic, Google Gemini, etc.)
- Batch task processing
- Task recording and replay
- iOS support (via WebDriverAgent)
- Multi-device coordination
⭐ Star this repo if you find it useful!
