Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflows/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,11 @@ on:
push:
branches:
- main
- staging
pull_request:
branches:
- main
- staging

jobs:
build-and-deploy:
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.env
*.ipynb
166 changes: 91 additions & 75 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
# 🚀 Serverless Browser Agents with Playwright + Lambda + Browserbase
## 🚀 Serverless Browser Agents with Playwright + Lambda + Browserbase
*Spin up headless browsers on AWS in under a minute—no layers, no EC2, no pain.*

[![Build](https://github.com/derekmeegan/browserbase-lambda-playwright/actions/workflows/deploy.yaml/badge.svg)](../../actions/workflows/deploy.yaml)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

> **Star ⭐ this repo if it saves you hours, and hit _Fork_ to make it yours in seconds.**

---

## ⚡ TL;DR Quick-Start

### Option A: Local Deployment
Expand All @@ -17,106 +15,129 @@
git clone https://github.com/your-username/browserbase-lambda-playwright.git
cd browserbase-lambda-playwright

# 2. Export AWS & Browserbase secrets
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export BROWSERBASE_API_KEY=...
export BROWSERBASE_PROJECT_ID=...

# 3. Create AWS Secrets Manager entries
aws secretsmanager create-secret \
--name BrowserbaseLambda/BrowserbaseApiKey \
--secret-string '{"BROWSERBASE_API_KEY":"'"$BROWSERBASE_API_KEY"'"}'

aws secretsmanager create-secret \
--name BrowserbaseLambda/BrowserbaseProjectId \
--secret-string '{"BROWSERBASE_PROJECT_ID":"'"$BROWSERBASE_PROJECT_ID"'"}'

# 4. Deploy (creates the Lambda + IAM + Secrets wiring)
# 2. Deploy infrastructure
env | grep AWS || export AWS_ACCESS_KEY_ID=... && export AWS_SECRET_ACCESS_KEY=...
cd infra && pip install -r requirements.txt && cdk deploy --all --require-approval never

# 3. Fetch API details from CloudFormation outputs
echo "export API_ENDPOINT_URL=$(aws cloudformation describe-stacks \
--stack-name BrowserbaseLambdaStack \
--query 'Stacks[0].Outputs[?OutputKey==`ApiEndpointUrl`].OutputValue' \
--output text)"

echo "export API_KEY=$(aws apigateway get-api-key \
--api-key $(aws cloudformation describe-stacks --stack-name BrowserbaseLambdaStack \
--query 'Stacks[0].Outputs[?OutputKey==`ApiKeyId`].OutputValue' --output text) \
--include-value \
--query 'value' \
--output text)"

# 4. Install example dependencies and run quick start
pip install -r examples/requirements.txt
python examples/quick_start.py
```

### Option B: GitHub Actions Deployment

```bash
# 1. Create your own repository
# Either fork this repository on GitHub or create a new one and push this code
# 1. Fork or push this repo to your GitHub account
# 2. Add repository secrets under Settings → Secrets & variables → Actions:
# - AWS_ACCESS_KEY
# - AWS_SECRET_ACCESS_KEY
# 3. Create Browserbase secrets in AWS Secrets Manager (see infra/stack.py env names)
# 4. Push to main → GitHub Actions triggers CDK deploy
```

# 2. Add GitHub repository secrets
# Go to your repository → Settings → Secrets and variables → Actions → New repository secret
# Add these secrets:
# - AWS_ACCESS_KEY: Your AWS Access Key ID
# - AWS_SECRET_ACCESS_KEY: Your AWS Secret Access Key
---

# 3. Create AWS Secrets Manager entries (same as Option A step 3)
You now have a Lambda that opens a Browserbase session and runs Playwright code from **`lambdas/scraper/scraper.py`**.
Invoke it with:

# 4. Push to main branch to trigger deployment
git push origin main
```bash
curl -X POST "$API_ENDPOINT_URL" \
-H "Content-Type: application/json" \
-H "x-api-key: $API_KEY" \
-d '{"url":"https://news.ycombinator.com/"}' \
-v

# …then poll status:
curl -H "x-api-key: $API_KEY" "$API_ENDPOINT_URL/<jobId>"
```

You now have a Lambda that opens a Browserbase session and runs Playwright code from **`src/scraper.py`**.
Invoke it with:
**OR**

```bash
aws lambda invoke \
--function-name <deployed-lambda-name> \
--payload '{"url":"https://news.ycombinator.com/"}' \
response.json && cat response.json | jq
```
pip install -r examples/requirements.txt
python examples/quick_start.py
```

---
## 🔄 Serverless Async Architecture

1. **POST /scrape** returns **202 Accepted** immediately.
2. Job metadata is stored in DynamoDB (`JobStatusTable`) with status updates (PENDING → RUNNING → SUCCESS/FAILED).
3. **GET /scrape/{jobId}** polls DynamoDB for the latest job result.

## 🚀 Why use this template?

* **Zero binary juggling** – Playwright lives in the Docker image; heavy Chrome lives on Browserbase.
* **Cold-start ≈ 2 s** – no browser download, just connect-over-CDP.
* **Zero binary juggling** – Playwright lives in the Lambda image; Chrome runs remotely on Browserbase.
* **Cold-start ≈ 2 s** – no browser download, just connect-over-CDP.
* **Pay-per-run** – pure Lambda pricing; scale by upgrading Browserbase, not infra.
* **Built-in CI/CD** – GitHub Actions deploys on every push to `main`.

---
* **Async, serverless** – fire-and-forget POST, durable job tracking via DynamoDB.
* **Built-in CI/CD** – GitHub Actions deploys on every push to `main`/`staging`.

## 🏗️ High-Level Architecture

```
```text
┌────────────┐ CDP (WebSocket) ┌────────────┐
│ AWS Lambda │ ────────────────▶ │ Browserbase│
└────────────┘ └────────────┘
│ Logs
AWS CloudWatch
```

---
Amazon DynamoDB (JobStatusTable)
```

## 📦 Project Layout

```
.
├── .github/workflows/deploy.yaml # CI/CD pipeline
├── infra/ # CDK IaC
│ ├── app.py
│ ├── stack.py
├── .github/workflows/deploy.yaml
├── examples/
│ ├── quick_start.py
│ └── requirements.txt
└── src/
├── Dockerfile # Lambda image
├── scraper.py # Playwright logic
└── requirements.txt
├── infra/
│ ├── app.py
│ ├── cdk.json
│ ├── requirements.txt
│ └── stack.py
├── lambdas/
│ ├── getter/
│ │ ├── Dockerfile
│ │ ├── getter.py
│ │ └── requirements.txt
│ └── scraper/
│ ├── Dockerfile
│ ├── scraper.py
│ └── requirements.txt
├── .gitignore
├── README.md
└── LICENSE
```

---

<details>
<summary>🔍 Full Setup & Prerequisites</summary>

### Requirements

| Tool | Version |
| --- | --- |
| AWS CLI | any 2.x |
| Docker | ≥ 20.10 |
| Node & npm | any LTS (for CDK) |
| Python | 3.12+ |
| Browserbase account | free tier works |
| Tool | Version |
| ------------------------- | ------------ |
| AWS CLI | any 2.x |
| Docker | ≥ 20.10 |
| Node & npm | any LTS |
| Python | 3.12+ |
| Browserbase account | free tier OK |

### 1. Install the AWS CLI

Expand Down Expand Up @@ -145,32 +166,27 @@ aws secretsmanager create-secret \
--secret-string '{"BROWSERBASE_PROJECT_ID":"$BROWSERBASE_PROJECT_ID"}'
```

### 4. Local Playwright install (optional for dev)
### 4. (Optional) Local Playwright install

```bash
pip install playwright && python -m playwright install
```

</details>

---

## ❓ FAQ

| Question | Answer |
| --- | --- |
| **Does this work on Browserbase free tier?** | Yes—1 concurrent session and rate-limited creation. |
| **Cold-starts?** | Typical < 2000 ms; browser runs remotely. |
| **How do I add extra Python libs?** | Add them to `src/requirements.txt`, rebuild, push—GitHub Actions redeploys. |

---
| Question | Answer |
| ------------------------------------------------- | ------------------------------------------------------------------------------------ |
| **Browserbase free tier?** | Yes—1 concurrent session; creation rate‑limited. |
| **Cold‑starts?** | Typical < 2 s (CDP connect, no browser download). |
| **Add extra Python libs?** | Add to `lambdas/<getter|scraper>/requirements.txt`, rebuild images, push → redeploy. |
| **API returns 202 Accepted—how to track status?** | Poll `GET /scrape/{jobId}` to read status/results from DynamoDB. |

## 🤝 Contributing

Pull requests are welcome! Please open an issue first if you plan a large change.

---

## 📄 License

This project is licensed under the MIT License – see the [LICENSE](LICENSE) file for details.
Loading