Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties

🏠 Project Page | 🛠️ Hijacking Tool | 📃 Paper

This repository contains code & data for the paper: "Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties".

⚙️ Project Setup

Please clone the project first.

$ git clone https://github.com/Zsbyqx20/AgentHazard.git

Virtual Environment

It is recommended to use uv for setting up the dependencies of this project.

$ uv sync --no-dev

It is also okay to use your preferred dependency manager through requirements.txt.

$ pip install -r requirements.txt

Data Download

You can get access to our dynamic & static dataset through this link, or the commands below:

$ wget "https://cloud.tsinghua.edu.cn/f/392e8053a8a5494a8afb/?dl=1" -O data.tar.gz
$ mv data.tar.gz /path/to/the/repository
$ tar xzf data.tar.gz

After decompress the file, please ensure the structure of the folders:

├── data
│   ├── dynamic
│   └── static
└── src
    └── agenthazard
        └── ...

📈 Run Evaluations

Dynamic Dataset

We use jinja template to manage and generate configurations for the hijacking tool. Please download the latest release and install it on your AVD.

In order to convert the jinja files into JSON files, just run:

# if you are using uv
$ ah generate

# if you are using other dependency manager
$ python -m agenthazard.cli generate

After that, please clone our customized Android World repository, and put the generated JSON files into the config folder.

Here the config folder refers to AndroidWorld instead of the current repository.

$ git clone https://github.com/Zsbyqx20/android_world.git

You need to follow the instructions of Android World to set up environments and AVDs. Here is an example.

python run.py \
  --suite_family=android_world \
  --agent_name=t3a_gpt4 \
  --perform_emulator_setup \
  --tasks=ContactsAddContact,ClockStopWatchRunning \
  --attack_config config/xxxxx.json \ # specify the configuration path here
  --break_on_misleading_actions # if specified, the program will quit when a misleading action is detected.

Note: to keep the consistency among all runs, we recommend you to save a checkpoint after you just finishing configurations of a fresh AVD. Refer to the eval folder under the Android World project to check more details on how to automatically run evaluations. We provide a solution on robust mechanisms including automatic reloading states when meeting an error and restarting AVDs if needed.

Static Dataset

The static part of our dataset is under the data/static/ folder. Each scenario contains 4 files, namely screenshot.jpg, filtered_elements.json, metadata.json, and original_vh.json. The detailed logic can be found at src/agenthazard/dataset.py.

For evaluation, you are required to set your API key in the .env file. You may refer to .env.local file as an example, and refer to src/agenthazard/api/client for more clients available, or customizing your own LLM client.

OPENAI_API_KEY=sk-xxxx
OPENAI_BASE_URL=https://xxxx.com

We provide convenient evaluation entries through the eval command:

# if you are using uv
$ ah eval --help 

# if you are using other dependency manager
$ python -m agenthazard.cli eval --help

For each modality of the agent, we choose M3A, T3A and UGround (to be released) to implement for evaluation logic. We support to choose from different agents, different LLMs, as well as different misleading actions as settings.

Please check our paper for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
src/agenthazard		src/agenthazard
.env.local		.env.local
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties

⚙️ Project Setup

Virtual Environment

Data Download

📈 Run Evaluations

Dynamic Dataset

Static Dataset

About

Uh oh!

Uh oh!

Languages

Zsbyqx20/AgentHazard

Folders and files

Latest commit

History

Repository files navigation

Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties

⚙️ Project Setup

Virtual Environment

Data Download

📈 Run Evaluations

Dynamic Dataset

Static Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages