Skip to content

Existing dataset repurposing #132

@Locke0

Description

@Locke0

In order to generate a training data mixture targeting these failure configurations we are seeing on OSWorld eval tasks: https://coda.io/d/_dzujAXxemDw/Perturbation-Scenario-Generation_suJUs-NX#_luobZbUO, we need to convert existing browser datasets into OS environment.

Specifically, we need to convert the ground truth action from browser viewport coordinate frame to OS coordinate frame (full screen) considering the size and position of the browser window (when not full screen).

Here are the key questions to answer:

  • Considering the tasks here, what is the logic to convert them or create / repurpose it into new trajectory from them (e.g., combining two trajectories from mind2web into 1 by having two browser instances and do the task back and forth or sequentially)?

  • To convert the ground truth action coordinates, can we get the locations of the ground truth action target element by using playwright and mind2web dataset 'pos_element' field? can we get the locations and sizes of browser windows from a11y considering the OSWorld setup with setup.py and python.py?. If not, are there any other ways?

  • what mind2web tasks can we use?

Deliverables:

  • Design conversion or repurposing logic.
  • Design mind2Web trajectory selection strategy
  • Review the logic and strategy with @Locke0 early next week
  • Implement them

Metadata

Metadata

Labels

developmentCode implementation, engineering

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions