Browser Tool

The Browser Tool enables AI-driven browser interactions. Launch browser sessions, click elements, type text, scroll pages, and capture screenshots through natural language commands.

What You’ll Learn

Session lifecycle: launch → interact → close
Browser actions: click, type, scroll
Use cases: UI testing, screenshots, navigation

Session Lifecycle

Every browser automation workflow follows a strict sequence:

Launch - Start a browser session at a target URL
Interact - Perform actions (click, type, scroll)
Close - End the session to release resources

Browser state persists across actions within a session. You must close the browser before using other Verdent tools.

Each action returns a screenshot showing the current browser state. Review screenshots between actions to verify success before proceeding.

Browser Actions

launch
click
type
scroll
close

Start a new browser session

Required: target URL
Opens browser at 1920x1080 resolution
Always the first action in any workflow

Launch browser at https://example.com

Click at specific coordinates

Required: x,y coordinates
Coordinates are viewport-relative
Target element centers for reliability

Click coordinates 450,300

Type text via keyboard

Required: text to type
Types into currently focused element
Often follows a click on an input field

Type "user@example.com"

Scroll the page

scroll_down - Scroll one page height down
scroll_up - Scroll one page height up
Reveals off-screen content

Scroll down to load more content

End browser session

Always the last action in any workflow
Required before using other tools
Releases browser resources

Close browser

Coordinates are relative to the 1920x1080 viewport. Center is approximately (960, 540). Use screenshots to estimate element positions.

Common Use Cases

UI Testing
Screenshots
Navigation

Test form submissions and navigation flowsLaunch at a login page, click input fields, type credentials, submit forms, and verify results through screenshots.

Launch browser at https://app.example.com/login
Click coordinates 450,280
Type "testuser@example.com"
Click coordinates 450,340
Type "password123"
Click coordinates 500,420
Close browser

Capture pages for documentationScreenshots are captured automatically after each action. Navigate to target pages and sections to build visual documentation.

Launch browser at https://docs.example.com
Scroll down to API section
Close browser

Navigate to target contentUse browser automation to reach content that requires interaction (clicking menus, loading lazy content) before extraction.

Launch browser at https://store.example.com
Scroll down three times
Click "Next Page" at 960,800
Close browser

Limitations

Tool exclusivity - Only browser_action can be used during active sessions
Coordinate-based - Requires x,y coordinates, not CSS selectors
Fixed resolution - Browser viewport locked at 1920x1080
Chrome only - Puppeteer supports Chrome/Chromium browsers
No persistence - Sessions don’t survive Verdent restarts
No WSL support - Browser Tool does not work in WSL environments
No saved state - Each session starts fresh without cookies or authentication
Single session - Only one browser session can be active at a time

Always close the browser session before using file operations, search tools, or bash commands. The browser locks other tools during active sessions.

Code Diff

Review and approve code changes

Getting Started

Core Features

Task-Based Guides

Common Workflows

Pricing

Execution Modes & Permissions

Resource Management

Error Handling & Recovery

Best Practices

Configuration

Agents & Rules

Advanced Features

Troubleshooting

Reference

What You’ll Learn

Session Lifecycle

Browser Actions

Common Use Cases

Limitations

See Also

Code Diff

Getting Started

Core Features

Task-Based Guides

Common Workflows

Pricing

Execution Modes & Permissions

Resource Management

Error Handling & Recovery

Best Practices

Configuration

Agents & Rules

Advanced Features

Troubleshooting

Reference

​What You’ll Learn

​Session Lifecycle

​Browser Actions

​Common Use Cases

​Limitations

​See Also

Code Diff

What You’ll Learn

Session Lifecycle

Browser Actions

Common Use Cases

Limitations

See Also