Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions humanpages/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__pycache__/
21 changes: 21 additions & 0 deletions humanpages/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 TinyFish, Inc.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
65 changes: 65 additions & 0 deletions humanpages/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
.PHONY: all format lint test tests integration_tests docker_tests help extended_tests

# Default target executed when no arguments are given to make.
all: help

# Define a variable for the test file path.
TEST_FILE ?= tests/unit_tests/
integration_test integration_tests: TEST_FILE = tests/integration_tests/


# unit tests are run with the --disable-socket flag to prevent network calls
test tests:
poetry run pytest --disable-socket --allow-unix-socket $(TEST_FILE)

test_watch:
poetry run ptw --now . -- --snapshot-update -vv $(TEST_FILE)

# integration tests are run without the --disable-socket flag to allow network calls
integration_test integration_tests:
poetry run pytest $(TEST_FILE)

######################
# LINTING AND FORMATTING
######################

# Define a variable for Python and notebook files.
PYTHON_FILES=.
MYPY_CACHE=.mypy_cache
lint format: PYTHON_FILES=.
BASE_REF ?= main
lint_diff format_diff: PYTHON_FILES=$(shell git diff --relative --name-only --diff-filter=d $(BASE_REF)...HEAD | grep -E '\.py$$|\.ipynb$$')
lint_package: PYTHON_FILES=agentql_humanpages
lint_tests: PYTHON_FILES=tests
lint_tests: MYPY_CACHE=.mypy_cache_test

lint lint_diff lint_package lint_tests:
[ "$(PYTHON_FILES)" = "" ] || poetry run ruff check $(PYTHON_FILES)
[ "$(PYTHON_FILES)" = "" ] || poetry run ruff format $(PYTHON_FILES) --diff
[ "$(PYTHON_FILES)" = "" ] || { mkdir -p $(MYPY_CACHE) && poetry run mypy $(PYTHON_FILES) --cache-dir $(MYPY_CACHE); }

format format_diff:
[ "$(PYTHON_FILES)" = "" ] || poetry run ruff format $(PYTHON_FILES)
[ "$(PYTHON_FILES)" = "" ] || poetry run ruff check --select I --fix $(PYTHON_FILES)

spell_check:
poetry run codespell --toml pyproject.toml

spell_fix:
poetry run codespell --toml pyproject.toml -w

check_imports: $(shell find agentql_humanpages -name '*.py')
poetry run python ./scripts/check_imports.py $^

######################
# HELP
######################

help:
@echo '----'
@echo 'check_imports - check imports'
@echo 'format - run code formatters'
@echo 'lint - run linters'
@echo 'test - run unit tests'
@echo 'tests - run unit tests'
@echo 'test TEST_FILE=<test_file> - run all tests in file'
117 changes: 117 additions & 0 deletions humanpages/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# agentql-humanpages

An integration package connecting [AgentQL](https://www.agentql.com/) and [Human Pages](https://humanpages.ai) for human-in-the-loop web data extraction.

When AgentQL's automated extraction fails -- due to anti-bot protections, CAPTCHAs, empty results, or any other blocker -- the task is automatically delegated to a human worker via the Human Pages platform.

## Installation

```bash
pip install -U agentql-humanpages
```

You need to configure both API keys:

- `AGENTQL_API_KEY` -- get one from the [AgentQL Dev Portal](https://dev.agentql.com)
- `HUMANPAGES_API_KEY` -- get one from [Human Pages](https://humanpages.ai)

## Quick Start

```python
from agentql_humanpages import HumanFallbackAgent

agent = HumanFallbackAgent(
agentql_api_key="your-agentql-key",
humanpages_api_key="your-humanpages-key",
)

result = agent.extract(
url="https://example.com/products",
query="{ products[] { name price } }",
)

if result["source"] == "agentql":
print("Extracted via AgentQL:", result["data"])
else:
print("Extracted via human:", result["messages"])
```

## HumanFallbackAgent

The main entry point. Attempts AgentQL extraction first, then falls back to Human Pages.

```python
agent = HumanFallbackAgent(
agentql_api_key="...", # or set AGENTQL_API_KEY env var
humanpages_api_key="...", # or set HUMANPAGES_API_KEY env var
price_usdc=5.0, # default price for human jobs
deadline_hours=24, # default deadline for human jobs
)
```

### extract()

```python
result = agent.extract(
url="https://example.com",
query="{ products[] { name price } }", # AgentQL query
# OR
prompt="Get all product names and prices", # Natural language
fallback_description="Custom instructions for the human worker",
price_usdc=10.0, # override default price
deadline_hours=12, # override default deadline
)
```

Returns a dict with:
- `source`: `"agentql"` or `"humanpages"`
- `data`: extracted data (when source is agentql)
- `job_id`, `status`, `messages`: job details (when source is humanpages)

### aextract()

Async version of `extract()` with the same interface.

## HumanPagesClient

Lower-level client for the Human Pages REST API:

```python
from agentql_humanpages import HumanPagesClient

client = HumanPagesClient(api_key="your-key")

# Search for available humans
humans = client.search_humans(skill="web task", available=True)

# Create a job
job = client.create_job(
human_id=humans[0]["id"],
title="Extract product data",
description="Visit example.com and extract all product names and prices.",
price_usdc=5.0,
deadline_hours=24,
)

# Check job status
status = client.get_job_status(job["id"])

# Get messages
messages = client.get_job_messages(job["id"])
```

All methods have async counterparts (`asearch_humans`, `acreate_job`, `aget_job_status`, `aget_job_messages`).

## Run Tests

Unit tests (no network calls):

```bash
make test
```

Integration tests (requires valid API keys):

```bash
make integration_tests
```
28 changes: 28 additions & 0 deletions humanpages/agentql_humanpages/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from importlib import metadata

from agentql_humanpages.agent import HumanFallbackAgent
from agentql_humanpages.client import HumanPagesClient
from agentql_humanpages.const import (
DEFAULT_DEADLINE_HOURS,
DEFAULT_POLL_INTERVAL_SECONDS,
DEFAULT_PRICE_USDC,
DEFAULT_TIMEOUT_SECONDS,
HUMANPAGES_BASE_URL,
)

try:
__version__ = metadata.version(__package__)
except metadata.PackageNotFoundError:
# Case where package metadata is not available.
__version__ = "0.1.0"

__all__ = [
"DEFAULT_DEADLINE_HOURS",
"DEFAULT_POLL_INTERVAL_SECONDS",
"DEFAULT_PRICE_USDC",
"DEFAULT_TIMEOUT_SECONDS",
"HUMANPAGES_BASE_URL",
"HumanFallbackAgent",
"HumanPagesClient",
"__version__",
]
Loading