Skip to content

add connect url from api #568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
9339b4c
draft: add connect url from api
kamath Mar 10, 2025
c32179a
upgrade version
kamath Mar 14, 2025
3bb302e
reset example
kamath Mar 14, 2025
c9fcbba
space
kamath Mar 14, 2025
81ad3dd
connect url
kamath Mar 14, 2025
5dbfd02
Merge branch 'main' into anirudh/add-connecturl-from-api
kamath Mar 14, 2025
6883a42
prettier
miguelg719 Mar 14, 2025
add2f4a
Delete examples/accessibility_test.ts
miguelg719 Mar 14, 2025
bdd808d
add eval for resume session
kamath Mar 15, 2025
dc2501d
package json
kamath Mar 15, 2025
cbe1a22
rm console log
kamath Mar 15, 2025
fe8aea4
get rid of domDebug (#577)
seanmcguire12 Mar 14, 2025
6f3f82b
regression evals (#583)
seanmcguire12 Mar 15, 2025
fb5ff74
telemetry + debug logging (#569)
seanmcguire12 Mar 15, 2025
90283e8
export agent types (#587)
sameelarif Mar 16, 2025
b57eb03
update api payload (#585)
sameelarif Mar 16, 2025
4f56a10
added new openai cu model (#590)
miguelg719 Mar 16, 2025
34b4457
temporary changeset (#591)
miguelg719 Mar 16, 2025
6e799f9
remove healthcheck ping (#584)
miguelg719 Mar 17, 2025
8b695d8
support api usage for extract with no args (#582)
seanmcguire12 Mar 17, 2025
6444573
Override screenshot logic (#589)
miguelg719 Mar 17, 2025
a7a949e
add agent API support (#588)
sameelarif Mar 17, 2025
2bbb111
operator handler (#586)
sameelarif Mar 18, 2025
312af22
fixed anthropic open operator (#598)
miguelg719 Mar 19, 2025
ae23d1e
pass `observeHandler` into `actHandler` (#594)
seanmcguire12 Mar 20, 2025
f77ab47
Refactor: rm unused fn (#603)
seanmcguire12 Mar 20, 2025
8f679ef
add `history` primitive (#600)
sameelarif Mar 20, 2025
735586a
delete old CU models - won't be supported (#604)
miguelg719 Mar 21, 2025
fea1255
send `browserbaseSessionID` to creation request
sameelarif Mar 21, 2025
93c0268
changeset
sameelarif Mar 21, 2025
270fab8
Revert "send `browserbaseSessionID` to creation request"
sameelarif Mar 21, 2025
d823e5b
Revert "changeset"
sameelarif Mar 21, 2025
494019a
custom error classes (#601)
seanmcguire12 Mar 22, 2025
ca408c7
support browserbasesessionid for resuming a session on api (#605)
sameelarif Mar 22, 2025
60f5244
draft: add connect url from api
kamath Mar 10, 2025
be7ff7b
reset example
kamath Mar 14, 2025
8ed2ce1
prettier
miguelg719 Mar 14, 2025
d65d612
add eval for resume session
kamath Mar 15, 2025
674d793
package json
kamath Mar 15, 2025
3e7e72b
fix connect url e2e test
kamath Mar 26, 2025
9aa8248
test.only
kamath Mar 26, 2025
c148894
uncomment
kamath Mar 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/cool-lemons-report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

pass observeHandler into actHandler
5 changes: 5 additions & 0 deletions .changeset/curly-rules-build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": minor
---

Added support for offloading agent tasks to the API.
5 changes: 5 additions & 0 deletions .changeset/empty-spoons-float.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": minor
---

Added a `stagehand.history` array which stores an array of `act`, `extract`, `observe`, and `goto` calls made. Since this history array is stored on the `StagehandPage` level, it will capture methods even if indirectly called by an agent.
5 changes: 5 additions & 0 deletions .changeset/fifty-crabs-arrive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

you can now call stagehand.metrics to get token usage metrics. you can also set logInferenceToFile in stagehand config to log the entire call/response history from stagehand & the LLM.
5 changes: 5 additions & 0 deletions .changeset/four-hoops-mix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": minor
---

add custom error classes
5 changes: 5 additions & 0 deletions .changeset/free-pots-move.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

Added CDP support for screenshots, find more about the benefits here: https://docs.browserbase.com/features/screenshots#why-use-cdp-for-screenshots%3F
5 changes: 5 additions & 0 deletions .changeset/full-trams-learn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

Fix to remove unnecessary healtcheck ping on sdk
5 changes: 5 additions & 0 deletions .changeset/gold-hounds-stand.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

Add BB SDK 2.4.0 to get connectUrl from an existing session
5 changes: 5 additions & 0 deletions .changeset/petite-donuts-lead.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

support api usage for extract with no args
5 changes: 5 additions & 0 deletions .changeset/puny-garlics-join.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

Fix the open operator handler to work with anthropic
5 changes: 5 additions & 0 deletions .changeset/rare-tires-turn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

Added support for resuming a Stagehand session created on the API.
5 changes: 5 additions & 0 deletions .changeset/shiny-windows-attack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

remove debugDom
5 changes: 5 additions & 0 deletions .changeset/six-lies-lie.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

rm unused handlePossiblePageNavigation
5 changes: 5 additions & 0 deletions .changeset/wise-worlds-pull.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": major
---

temporary placeholder
5 changes: 5 additions & 0 deletions .changeset/young-dots-fry.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": minor
---

Added native Stagehand agentic loop functionality. This allows you to build agentic workflows with a single prompt without using a computer-use model. To try it out, create a `stagehand.agent` without passing in a provider.
144 changes: 142 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ jobs:
determine-evals:
runs-on: ubuntu-latest
outputs:
run-combination: ${{ steps.check-labels.outputs.run-combination }}
run-extract: ${{ steps.check-labels.outputs.run-extract }}
run-act: ${{ steps.check-labels.outputs.run-act }}
run-observe: ${{ steps.check-labels.outputs.run-observe }}
Expand All @@ -31,6 +32,7 @@ jobs:
# Default to running all tests on main branch
if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
echo "Running all tests for main branch"
echo "run-combination=true" >> $GITHUB_OUTPUT
echo "run-extract=true" >> $GITHUB_OUTPUT
echo "run-act=true" >> $GITHUB_OUTPUT
echo "run-observe=true" >> $GITHUB_OUTPUT
Expand All @@ -40,6 +42,7 @@ jobs:
fi

# Check for specific labels
echo "run-combination=${{ contains(github.event.pull_request.labels.*.name, 'combination') }}" >> $GITHUB_OUTPUT
echo "run-extract=${{ contains(github.event.pull_request.labels.*.name, 'extract') }}" >> $GITHUB_OUTPUT
echo "run-act=${{ contains(github.event.pull_request.labels.*.name, 'act') }}" >> $GITHUB_OUTPUT
echo "run-observe=${{ contains(github.event.pull_request.labels.*.name, 'observe') }}" >> $GITHUB_OUTPUT
Expand Down Expand Up @@ -147,7 +150,7 @@ jobs:
run: npm run e2e:local

run-e2e-bb-tests:
needs: [run-e2e-tests]
needs: [run-lint, run-build]
runs-on: ubuntu-latest
timeout-minutes: 50
if: >
Expand Down Expand Up @@ -183,8 +186,129 @@ jobs:
- name: Run E2E Tests (browserbase)
run: npm run e2e:bb

run-regression-evals-dom-extract:
needs:
[run-e2e-bb-tests, run-e2e-tests, run-e2e-local-tests, determine-evals]
runs-on: ubuntu-latest
timeout-minutes: 7
outputs:
regression_dom_score: ${{ steps.set-dom-score.outputs.regression_dom_score }}
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }}
BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }}
HEADLESS: true
EVAL_ENV: browserbase
steps:
- name: Check out repository code
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"

- name: Install dependencies
run: |
rm -rf node_modules
rm -f package-lock.json
npm install

- name: Build Stagehand
run: npm run build

- name: Install Playwright browsers
run: npm exec playwright install --with-deps

- name: Run Regression Evals (domExtract)
run: npm run evals category regression_dom_extract trials=2 concurrency=12 env=BROWSERBASE -- --extract-method=domExtract

- name: Save Regression domExtract Results
run: mv eval-summary.json eval-summary-regression-dom.json

- name: Log and Regression (domExtract) Evals Performance
id: set-dom-score
run: |
experimentNameRegressionDom=$(jq -r '.experimentName' eval-summary-regression-dom.json)
regression_dom_score=$(jq '.categories.regression_dom_extract' eval-summary-regression-dom.json)
echo "regression_dom_extract category score: ${regression_dom_score}%"
echo "View regression_dom_extract results: https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentNameRegressionDom}"
echo "regression_dom_score=$regression_dom_score" >> "$GITHUB_OUTPUT"

run-regression-evals-text-extract:
needs:
[run-e2e-bb-tests, run-e2e-tests, run-e2e-local-tests, determine-evals]
runs-on: ubuntu-latest
timeout-minutes: 7
outputs:
regression_text_score: ${{ steps.set-text-score.outputs.regression_text_score }}
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
BRAINTRUST_API_KEY: ${{ secrets.BRAINTRUST_API_KEY }}
BROWSERBASE_API_KEY: ${{ secrets.BROWSERBASE_API_KEY }}
BROWSERBASE_PROJECT_ID: ${{ secrets.BROWSERBASE_PROJECT_ID }}
HEADLESS: true
EVAL_ENV: browserbase
steps:
- name: Check out repository code
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: "20"

- name: Install dependencies
run: |
rm -rf node_modules
rm -f package-lock.json
npm install

- name: Build Stagehand
run: npm run build

- name: Install Playwright browsers
run: npm exec playwright install --with-deps

- name: Run Regression Evals (textExtract)
run: npm run evals category regression_text_extract trials=2 concurrency=12 env=BROWSERBASE -- --extract-method=textExtract

- name: Save Regression textExtract Results
run: mv eval-summary.json eval-summary-regression-text.json

- name: Log Regression (textExtract) Evals Performance
id: set-text-score
run: |
experimentNameRegressionText=$(jq -r '.experimentName' eval-summary-regression-text.json)
regression_text_score=$(jq '.categories.regression_text_extract' eval-summary-regression-text.json)
echo "regression_text_extract category score: ${regression_text_score}%"
echo "View regression_text_extract results: https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentNameRegressionText}"
echo "regression_text_score=$regression_text_score" >> "$GITHUB_OUTPUT"

check-regression-evals-score:
needs: [run-regression-evals-text-extract, run-regression-evals-dom-extract]
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Compare Overall Regression Evals Score
run: |
regression_dom_score="${{ needs.run-regression-evals-dom-extract.outputs.regression_dom_score }}"
regression_text_score="${{ needs.run-regression-evals-text-extract.outputs.regression_text_score }}"

overall_score=$(echo "(${regression_dom_score} + ${regression_text_score}) / 2" | bc -l)
echo "Overall regression score: ${overall_score}%"

# Fail if overall score is below 90%
if (( $(echo "${overall_score} < 90" | bc -l) )); then
echo "Overall regression score is below 90%. Failing CI."
exit 1
fi

run-combination-evals:
needs: [run-e2e-bb-tests, run-e2e-tests, determine-evals]
needs: [check-regression-evals-score, determine-evals]
runs-on: ubuntu-latest
timeout-minutes: 40
env:
Expand All @@ -199,27 +323,43 @@ jobs:
- name: Check out repository code
uses: actions/checkout@v4

- name: Check for 'combination' label
id: label-check
run: |
if [ "${{ needs.determine-evals.outputs.run-combination }}" != "true" ]; then
echo "has_label=false" >> $GITHUB_OUTPUT
echo "No label for COMBINATION. Exiting with success."
else
echo "has_label=true" >> $GITHUB_OUTPUT
fi

- name: Set up Node.js
if: needs.determine-evals.outputs.run-combination == 'true'
uses: actions/setup-node@v4
with:
node-version: "20"

- name: Install dependencies
if: needs.determine-evals.outputs.run-combination == 'true'
run: |
rm -rf node_modules
rm -f package-lock.json
npm install

- name: Build Stagehand
if: needs.determine-evals.outputs.run-combination == 'true'
run: npm run build

- name: Install Playwright browsers
if: needs.determine-evals.outputs.run-combination == 'true'
run: npm exec playwright install --with-deps

- name: Run Combination Evals
if: needs.determine-evals.outputs.run-combination == 'true'
run: npm run evals category combination

- name: Log Combination Evals Performance
if: needs.determine-evals.outputs.run-combination == 'true'
run: |
experimentName=$(jq -r '.experimentName' eval-summary.json)
echo "View results at https://www.braintrust.dev/app/Browserbase/p/stagehand/experiments/${experimentName}"
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ tmp/
eval-summary.json
pnpm-lock.yaml
evals/deterministic/tests/BrowserContext/tmp-test.har
examples/example.ts
13 changes: 13 additions & 0 deletions evals/deterministic/bb.stagehand.config.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
import { default as DefaultStagehandConfig } from "@/stagehand.config";
import type { ConstructorParams } from "@/dist";
import dotenv from "dotenv";
dotenv.config({ path: "../../.env" });

const StagehandConfig: ConstructorParams = {
...DefaultStagehandConfig,
env: "BROWSERBASE" /* Environment to run Stagehand in */,
browserbaseSessionCreateParams: {
projectId: process.env.BROWSERBASE_PROJECT_ID,
},
};
export default StagehandConfig;
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,6 @@ dotenv.config({ path: "../../.env" });
const StagehandConfig: ConstructorParams = {
...DefaultStagehandConfig,
env: "LOCAL" /* Environment to run Stagehand in */,
verbose: 1 /* Logging verbosity level (0=quiet, 1=normal, 2=verbose) */,
headless: true /* Run browser in headless mode */,
browserbaseSessionCreateParams: {
projectId: process.env.BROWSERBASE_PROJECT_ID,
},
enableCaching: false /* Enable caching functionality */,
};
export default StagehandConfig;
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test, expect } from "@playwright/test";
import { Stagehand } from "@/dist";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/e2e.stagehand.config";

test.describe("StagehandContext - addInitScript", () => {
test("should inject a script on the context before pages load", async () => {
Expand Down
2 changes: 1 addition & 1 deletion evals/deterministic/tests/BrowserContext/cookies.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test, expect } from "@playwright/test";
import { Stagehand } from "@/dist";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/e2e.stagehand.config";

test.describe("StagehandContext - Cookies", () => {
let stagehand: Stagehand;
Expand Down
2 changes: 1 addition & 1 deletion evals/deterministic/tests/BrowserContext/multiPage.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test, expect } from "@playwright/test";
import { Stagehand } from "@/dist";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/e2e.stagehand.config";
import { Page } from "@/dist";

import http from "http";
Expand Down
2 changes: 1 addition & 1 deletion evals/deterministic/tests/BrowserContext/page.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test, expect } from "@playwright/test";
import { Stagehand } from "@/dist";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/e2e.stagehand.config";

import http from "http";
import express from "express";
Expand Down
2 changes: 1 addition & 1 deletion evals/deterministic/tests/BrowserContext/routing.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test, expect } from "@playwright/test";
import { Stagehand } from "@/dist";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/e2e.stagehand.config";

import http from "http";
import express from "express";
Expand Down
2 changes: 1 addition & 1 deletion evals/deterministic/tests/Errors/apiKeyError.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { test, expect } from "@playwright/test";
import { Stagehand } from "@/dist";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/e2e.stagehand.config";
import { z } from "zod";

test.describe("API key/LLMClient error", () => {
Expand Down
3 changes: 2 additions & 1 deletion evals/deterministic/tests/browserbase/contexts.test.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import Browserbase from "@browserbasehq/sdk";
import { expect, test } from "@playwright/test";
import StagehandConfig from "@/evals/deterministic/stagehand.config";
import StagehandConfig from "@/evals/deterministic/bb.stagehand.config";
import { Stagehand } from "@/dist";

// Configuration
Expand Down Expand Up @@ -76,6 +76,7 @@ test.describe("Contexts", () => {
// We will be adding cookies to the context in this session, so we need mark persist=true
stagehand = new Stagehand({
...StagehandConfig,
env: "BROWSERBASE",
browserbaseSessionCreateParams: {
projectId: BROWSERBASE_PROJECT_ID,
browserSettings: {
Expand Down
Loading