The Browser Tool enables AI-driven browser interactions. Launch browser sessions, click elements, type text, scroll pages, and capture screenshots through natural language commands.
What You’ll Learn
- Session lifecycle: launch → interact → close
- Browser actions: click, type, scroll
- Use cases: UI testing, screenshots, navigation
Session Lifecycle
Every browser automation workflow follows a strict sequence:
- Launch - Start a browser session at a target URL
- Interact - Perform actions (click, type, scroll)
- Close - End the session to release resources
Browser state persists across actions within a session. You must close the browser before using other Verdent tools.
Each action returns a screenshot showing the current browser state. Review screenshots between actions to verify success before proceeding.
Browser Actions
launch
click
type
scroll
close
Start a new browser session
- Required: target URL
- Opens browser at 1920x1080 resolution
- Always the first action in any workflow
Launch browser at https://example.com
Click at specific coordinates
- Required: x,y coordinates
- Coordinates are viewport-relative
- Target element centers for reliability
Click coordinates 450,300
Type text via keyboard
- Required: text to type
- Types into currently focused element
- Often follows a click on an input field
Scroll the page
scroll_down - Scroll one page height down
scroll_up - Scroll one page height up
- Reveals off-screen content
Scroll down to load more content
End browser session
- Always the last action in any workflow
- Required before using other tools
- Releases browser resources
Coordinates are relative to the 1920x1080 viewport. Center is approximately (960, 540). Use screenshots to estimate element positions.
Common Use Cases
UI Testing
Screenshots
Navigation
Test form submissions and navigation flowsLaunch at a login page, click input fields, type credentials, submit forms, and verify results through screenshots.Launch browser at https://app.example.com/login
Click coordinates 450,280
Type "testuser@example.com"
Click coordinates 450,340
Type "password123"
Click coordinates 500,420
Close browser
Capture pages for documentationScreenshots are captured automatically after each action. Navigate to target pages and sections to build visual documentation.Launch browser at https://docs.example.com
Scroll down to API section
Close browser
Navigate to target contentUse browser automation to reach content that requires interaction (clicking menus, loading lazy content) before extraction.Launch browser at https://store.example.com
Scroll down three times
Click "Next Page" at 960,800
Close browser
Limitations
- Tool exclusivity - Only browser_action can be used during active sessions
- Coordinate-based - Requires x,y coordinates, not CSS selectors
- Fixed resolution - Browser viewport locked at 1920x1080
- Chrome only - Puppeteer supports Chrome/Chromium browsers
- No persistence - Sessions don’t survive Verdent restarts
- No WSL support - Browser Tool does not work in WSL environments
- No saved state - Each session starts fresh without cookies or authentication
- Single session - Only one browser session can be active at a time
Always close the browser session before using file operations, search tools, or bash commands. The browser locks other tools during active sessions.
See Also