OpenClaw skill
browse
An OpenClaw skill named "browse" that enables agents to interact with web pages. It supports capabilities such as navigating to URLs, extracting text, taking screenshots, filling forms, clicking elements, and scrolling. Usage involves the `browse_page` tool with required `url` parameter and optional `instructions`.
Security notice: review the SKILL.md file and repository content first before using any third-party skill.
Files
Review the files below to add this skill to your agents.
SKILL.md content
---
name: browser
description: Automate web browser interactions using natural language via CLI commands. Use when the user asks to browse websites, navigate web pages, extract data from websites, take screenshots, fill forms, click buttons, or interact with web applications. Supports remote Browserbase sessions with automatic CAPTCHA solving, anti-bot stealth mode, and residential proxies — ideal for scraping protected websites, bypassing bot detection, and interacting with JavaScript-heavy pages.
compatibility: "Requires the browse CLI (`npm install -g @browserbasehq/browse-cli`). Optional: set BROWSERBASE_API_KEY and BROWSERBASE_PROJECT_ID for remote Browserbase sessions; falls back to local Chrome otherwise."
license: MIT
allowed-tools: Bash
metadata:
openclaw:
requires:
bins:
- browse
install:
- kind: node
package: "@browserbasehq/browse-cli"
bins: [browse]
homepage: https://github.com/browserbase/skills
---
# Browser Automation
Automate browser interactions using the browse CLI with Claude.
## Setup check
Before running any browser commands, verify the CLI is available:
```bash
which browse || npm install -g @browserbasehq/browse-cli
```
## Environment Selection (Local vs Remote)
The CLI automatically selects between local and remote browser environments based on available configuration:
### Local mode (default)
- Uses local Chrome — no API keys needed
- Best for: development, simple pages, trusted sites with no bot protection
### Remote mode (Browserbase)
- Activated when `BROWSERBASE_API_KEY` and `BROWSERBASE_PROJECT_ID` are set
- Provides: anti-bot stealth, automatic CAPTCHA solving, residential proxies, session persistence
- **Use remote mode when:** the target site has bot detection, CAPTCHAs, IP rate limiting, Cloudflare protection, or requires geo-specific access
- Get credentials at https://browserbase.com/settings
### When to choose which
- **Simple browsing** (docs, wikis, public APIs): local mode is fine
- **Protected sites** (login walls, CAPTCHAs, anti-scraping): use remote mode
- **If local mode fails** with bot detection or access denied: switch to remote mode
## Commands
All commands work identically in both modes. The daemon auto-starts on first command.
### Navigation
```bash
browse open <url> # Go to URL (aliases: goto)
browse reload # Reload current page
browse back # Go back in history
browse forward # Go forward in history
```
### Page state (prefer snapshot over screenshot)
```bash
browse snapshot # Get accessibility tree with element refs (fast, structured)
browse screenshot [path] # Take visual screenshot (slow, uses vision tokens)
browse get url # Get current URL
browse get title # Get page title
browse get text <selector> # Get text content (use "body" for all text)
browse get html <selector> # Get HTML content of element
browse get value <selector> # Get form field value
```
Use `browse snapshot` as your default for understanding page state — it returns the accessibility tree with element refs you can use to interact. Only use `browse screenshot` when you need visual context (layout, images, debugging).
### Interaction
```bash
browse click <ref> # Click element by ref from snapshot (e.g., @0-5)
browse type <text> # Type text into focused element
browse fill <selector> <value> # Fill input and press Enter
browse select <selector> <values...> # Select dropdown option(s)
browse press <key> # Press key (Enter, Tab, Escape, Cmd+A, etc.)
browse drag <fromX> <fromY> <toX> <toY> # Drag from one point to another
browse scroll <x> <y> <deltaX> <deltaY> # Scroll at coordinates
browse highlight <selector> # Highlight element on page
browse is visible <selector> # Check if element is visible
browse is checked <selector> # Check if element is checked
browse wait <type> [arg] # Wait for: load, selector, timeout
```
### Session management
```bash
browse stop # Stop the browser daemon
browse status # Check daemon status (includes env)
browse env # Show current environment (local or remote)
browse env local # Switch to local Chrome
browse env remote # Switch to Browserbase (requires API keys)
browse pages # List all open tabs
browse tab_switch <index> # Switch to tab by index
browse tab_close [index] # Close tab
```
### Typical workflow
1. `browse open <url>` — navigate to the page
2. `browse snapshot` — read the accessibility tree to understand page structure and get element refs
3. `browse click <ref>` / `browse type <text>` / `browse fill <selector> <value>` — interact using refs from snapshot
4. `browse snapshot` — confirm the action worked
5. Repeat 3-4 as needed
6. `browse stop` — close the browser when done
## Quick Example
```bash
browse open https://example.com
browse snapshot # see page structure + element refs
browse click @0-5 # click element with ref 0-5
browse get title
browse stop
```
## Mode Comparison
| Feature | Local | Browserbase |
|---------|-------|-------------|
| Speed | Faster | Slightly slower |
| Setup | Chrome required | API key required |
| Stealth mode | No | Yes (custom Chromium, anti-bot fingerprinting) |
| CAPTCHA solving | No | Yes (automatic reCAPTCHA/hCaptcha) |
| Residential proxies | No | Yes (201 countries, geo-targeting) |
| Session persistence | No | Yes (cookies/auth persist across sessions) |
| Best for | Development/simple pages | Protected sites, bot detection, production scraping |
## Best Practices
1. **Always `browse open` first** before interacting
2. **Use `browse snapshot`** to check page state — it's fast and gives you element refs
3. **Only screenshot when visual context is needed** (layout checks, images, debugging)
4. **Use refs from snapshot** to click/interact — e.g., `browse click @0-5`
5. **`browse stop`** when done to clean up the browser session
## Troubleshooting
- **"No active page"**: Run `browse stop`, then check `browse status`. If it still says running, kill the zombie daemon with `pkill -f "browse.*daemon"`, then retry `browse open`
- **Chrome not found**: Install Chrome or use `browse env remote`
- **Action fails**: Run `browse snapshot` to see available elements and their refs
- **Browserbase fails**: Verify API key and project ID are set
## Switching to Remote Mode
Switch to remote when you detect: CAPTCHAs (reCAPTCHA, hCaptcha, Turnstile), bot detection pages ("Checking your browser..."), HTTP 403/429, empty pages on sites that should have content, or the user asks for it.
Don't switch for simple sites (docs, wikis, public APIs, localhost).
```bash
browse env remote # switch to Browserbase
browse env local # switch back to local Chrome
```
The switch is sticky until you run `browse stop` or switch again.
For detailed examples, see [EXAMPLES.md](EXAMPLES.md).
For API reference, see [REFERENCE.md](REFERENCE.md).
How this skill works
- The skill defines metadata including name 'Browse', namespace 'pkiv', description, inputs (url: string required, action: string optional), outputs (content: string, screenshot: string, title: string), and configuration (headless: boolean default true)
- It launches a headless Chromium browser using Playwright
- Creates a new browser page
- Navigates the page to the input URL
- Waits for the load state to be 'networkidle'
- Extracts page content as HTML or text based on action input
- Captures full-page screenshot as base64 if requested
- Retrieves the page title
- Closes the browser page and context
- Returns extracted content, screenshot, and title as outputs
When to use it
- When needing to launch a browser instance to visit and interact with web pages
- When instructed to navigate to a specific URL and extract information using natural language directives
- When required to perform actions like scrolling, clicking elements, or filling forms on a webpage
- When capturing screenshots or page content for further analysis in a task
Example use cases
- Extract main heading and first paragraph from a webpage: Navigate to https://example.com and use instructions 'Extract the main heading and first paragraph.' to retrieve specified content.
FAQs
More similar skills to explore
- achurch
An OpenClaw skill for church administration that handles member management, event scheduling, sermon retrieval, and donation processing. It provides tools to list members, add new members, schedule events, fetch sermons, and record donations.
- agent-config
An OpenClaw skill that enables agents to manage their configuration by loading from files, environment variables, or remote sources. It supports retrieving, setting, and validating configuration values. The skill allows for hot-reloading of configurations.
- agent-council
An OpenClaw skill named agent-council that enables the primary agent to summon a council of specialized sub-agents for deliberating on tasks. The council members discuss the query from unique perspectives, propose solutions, and vote to select the best response. The skill outputs the winning proposal with supporting rationale from the council.
- agent-identity-kit
An OpenClaw skill that equips agents with tools to craft, manage, and evolve digital identities, including generating personas, bios, avatars, and communication styles. It supports creating detailed agent personas with name, background, goals, personality traits; crafting bios for specific platforms; designing avatars; tuning voice and style; and adapting identities to new contexts.
- agenticflow-skill
An OpenClaw skill that provides tools for interacting with Agentic Flow. The tools enable agents to create agentic flows with defined tasks, execute existing flows, and retrieve flow status and outputs.
- agentlens
AgentLens is an OpenClaw skill that enables agents to inspect the internal cognition and actions of other agents. It provides visibility into reasoning traces (thoughts), tool calls and arguments, retrieved memories, and response generation. The skill supports analysis in multi-agent conversations via the "inspect" action targeting a specific agent.