EXTENSIONS
Desktop Automation
Automate native desktop applications across Windows, Mac, and Linux using karate-robot. Combine native mouse and keyboard events with Windows UI Automation, image recognition, and OCR text detection in a single test framework that integrates with API and web testing.
On this page:
- Quick Start - Maven, Gradle, standalone JAR setup
- robot Keyword - Configuration and options
- Window Management - Finding and switching windows
- Locator Strategies - Windows UI, image, OCR locators
- Mouse Actions - Click, move, drag operations
- Keyboard Input - Text and key combinations
- Element API - Locate, wait, tree walking
- Robot API - Clipboard, screenshots, debugging
- Cross-Platform - Windows, macOS, Linux specifics
- Advanced Patterns - Conditional start, web + desktop
- Troubleshooting - Common issues and solutions
Quick Start
Maven Setup
Add the karate-robot dependency:
<dependency>
<groupId>io.karatelabs</groupId>
<artifactId>karate-robot</artifactId>
<version>${karate.version}</version>
<scope>test</scope>
</dependency>
The karate-robot module includes JavaCPP presets for OpenCV and Tesseract. Follow JavaCPP guidance to reduce downloads for your specific OS.
Gradle Setup
testImplementation "io.karatelabs:karate-robot:${karateVersion}"
Standalone JAR
For non-developers or quick scripts, use the standalone JAR with VS Code. The karate-robot for Windows is approximately 150 MB and downloaded separately. See the Windows Install Guide for setup instructions.
Watch the debugging demo to see VS Code integration in action.
robot Keyword
The robot keyword activates desktop automation. Karate Robot only initializes when you use this keyword and the karate-robot dependency is present.
Basic Usage
Feature: Basic robot usage
Scenario: Target a window
# Find window with exact name
* robot { window: 'Calculator' }
# Find window where name contains 'Chrome'
* robot { window: '^Chrome' }
# Find window with regex match
* robot { window: '~MyApp|MYAPP' }
Configuration Options
| Option | Default | Description |
|---|---|---|
window | - | Window name to focus. Use ^ prefix for contains, ~ for regex |
fork | - | OS command to launch application (string, array, or JSON) |
autoClose | true | Close window when test ends if fork was used |
attach | true | Skip fork if window already exists |
basePath | null | Base path for image locators (e.g., classpath:images) |
highlight | false | Highlight matched elements visually |
highlightDuration | 3000 | Highlight duration in milliseconds |
retryCount | 3 | Retry attempts for finding window after fork |
retryInterval | 3000 | Milliseconds between retries |
autoDelay | 0 | Delay after native actions (ms), use if OS is too slow |
tessData | tessdata | Path to Tesseract OCR data files |
tessLang | eng | Default OCR language |
configure robot Pattern
Set global defaults in karate-config.js and override per test:
Feature: Configure robot pattern
Scenario: Use global config with local override
* configure robot = { highlight: true, highlightDuration: 500 }
* robot { window: '^My App' }
# Or shorthand when only window name needed
* robot '^My App'
Window Management
Finding Windows
Windows can be matched by exact name, contains pattern, or regex:
Feature: Window matching
Scenario: Different matching strategies
# Exact match
* robot { window: 'Calculator' }
# Contains match (window title contains 'Chrome')
* robot { window: '^Chrome' }
# Regex match (matches 'MyApp' or 'MYAPP')
* robot { window: '~MyApp|MYAPP' }
Window Methods
| Method | Description |
|---|---|
window(name) | Activate window by name (supports ^ and ~ prefixes) |
windowExists(name) | Returns boolean, does not activate |
windowOptional(name) | Returns Window object or "fake" if not found |
waitForWindowOptional(name) | Like windowOptional but with retry |
Feature: Window methods
Scenario: Conditional window handling
* robot { highlight: true }
# Check if window exists without switching
* def exists = windowExists('My Dialog')
# Handle optional modal dialog
* windowOptional('Tips on Startup').locate('Close').click()
# Wait for window that may appear
* retry(3).waitForWindowOptional('^Loading')
Launching Applications
Use fork to launch applications, with automatic detection of existing windows:
Feature: Launch application
Scenario: Start app if not running
# fork only executes if window not found (attach: true is default)
* robot { window: 'Calculator', fork: 'calc' }
# With full path and extended retry for slow apps
* robot { window: '^MyApp', fork: 'C:/Program Files/MyApp/app.exe', retryCount: 10 }
karate.fork()
For more control, use karate.fork() directly:
Feature: Conditional fork
Scenario: Start app only if needed
* robot { highlight: true }
* if (!windowExists('^Main Window')) karate.fork('C:/MyApp/app.exe')
* retry(10).window('^Main Window')
Locator Strategies
Karate Robot supports three locator strategies: Windows UI Automation (Windows only), image matching, and OCR text recognition.
Windows Locators
Windows UI Automation provides precise element access using XPath-like syntax:
| Locator | Description |
|---|---|
'Click Me' | First element with exact name "Click Me" |
'^Click' | First element where name contains "Click" |
'~Click|Submit' | First element matching regex |
'#AutomationId' | Element by Automation ID |
'//button{Click Me}' | Button with exact name |
'//button{^Click}' | Button where name contains "Click" |
'/pane[2]/button' | Absolute path: second pane, first button |
'//pane/*/button' | Wildcard depth matching |
'//button.TButton{^Click}' | Button with class name "TButton" |
'/root//window' | Search from desktop root |
Feature: Windows locators
Scenario: Use Windows UI Automation
* robot { window: 'Calculator', fork: 'calc' }
* click('Clear')
* click('One')
* click('Plus')
* click('Two')
* click('Equals')
* match locate('#CalculatorResults').name == 'Display is 3'
* screenshot()
* click('Close Calculator')
Use Inspect.exe to discover element properties for automation.
Image Locators
Match elements by PNG image. Images must be PNG format with .png extension:
Feature: Image locators
Scenario: Click by image
* robot { window: '^Chrome', basePath: 'classpath:images' }
* click('submit-button.png')
* waitFor('success-message.png')
Strictness factor: Prefix with number and colon to adjust matching sensitivity:
Feature: Image strictness
Scenario: Adjust image matching
# Default strictness (10)
* click('button.png')
# Strict matching (1 = most strict)
* click('1:button.png')
# Lenient matching (values > 10)
* click('15:button.png')
- Capture images at the same resolution as the target display
- Use the debugger with
highlight()to troubleshoot matching - Store images in a dedicated directory and set
basePath
OCR Locators
Find elements by visible text using Tesseract OCR. Prefix with {lang} pattern:
Feature: OCR locators
Scenario: Click by text
# English text
* click('{eng}Submit')
# Light text on dark background (negative)
* click('{-eng}Dark Mode')
# Use default tessLang
* click('{}Click Here')
Setup: Download language data files from Tesseract and place in tessdata folder. Choose between tessdata, tessdata-fast, or tessdata-best based on quality vs speed needs.
Text extraction:
Feature: OCR extraction
Scenario: Extract text from element
* robot { window: '^My App' }
* def text = locate('//pane{Results}').extract('eng')
* match text contains 'Search Results'
# Extract from entire screen
* def screenText = robot.root.extract()
# Debug: highlight all found words
* locate('//pane{Results}').debugExtract()
Mouse Actions
click()
Click elements by locator, coordinates, or with specific button:
Feature: Click actions
Scenario: Various click operations
* robot { window: '^My App' }
# Click by locator
* click('Submit')
# Click at coordinates (0,0 is top-left of screen)
* click(100, 200)
# Click with button: 1=left, 2=middle, 3=right
* click('Options', 3)
# Click at offset within element
* locate('Taxpayer').click(20, 40)
Other Mouse Actions
Feature: Mouse actions
Scenario: Mouse operations
* robot { window: '^My App' }
* doubleClick('file.txt')
* rightClick('context-menu-trigger')
# Move to coordinates or image
* move(500, 300)
* move('target.png')
# Drag and drop
* move('drag-source.png')
* press()
* move('drop-target.png')
* release()
Keyboard Input
Basic Input
Feature: Keyboard input
Scenario: Text and keys
* robot { window: '^Notepad' }
* input('Hello World')
* input(Key.ENTER)
* input('Second line')
Key Combinations
Modifier keys (Key.CTRL, Key.ALT, Key.META, Key.SHIFT) are automatically released:
Feature: Key combinations
Scenario: Keyboard shortcuts
* robot { window: '^Chrome' }
# Open new tab (Mac: Key.META, Windows: Key.CONTROL)
* input(Key.META + 't')
# Select all and copy
* input(Key.CONTROL + 'a')
* input(Key.CONTROL + 'c')
# Multiple keys as array
* input([Key.DOWN, Key.RIGHT, Key.ENTER])
# Array with delay between keys (ms)
* input([Key.DOWN, Key.DOWN, Key.ENTER], 100)
# Slow typing (delay per character)
* input('type slowly', 50)
Available keys: Key.ENTER, Key.TAB, Key.ESCAPE, Key.BACKSPACE, Key.DELETE, Key.UP, Key.DOWN, Key.LEFT, Key.RIGHT, Key.HOME, Key.END, Key.PAGE_UP, Key.PAGE_DOWN, Key.F1 through Key.F12, and more.
Element API
Finding Elements
Feature: Element finding
Scenario: Locate elements
* robot { window: '^My App' }
# locate() fails if not found
* def btn = locate('Submit')
* btn.click()
# optional() returns "fake" element if not found
* optional('//pane{Warning}').locate('Close').click()
# exists() returns boolean
* assert exists('//pane{Main}')
# locateAll() returns array
* def buttons = locateAll('//button')
* buttons[1].click()
Wait Methods
Feature: Wait methods
Scenario: Wait for elements
* robot { window: '^My App' }
# Wait for element to appear
* waitFor('Loading Complete').click()
# Wait but don't fail if not found
* retry(2).waitForOptional('Optional Dialog')
# Wait until condition is true
* def checkEnabled = function(){ return optional('Submit').enabled }
* waitUntil(checkEnabled)
# Simple delay (avoid if possible)
* delay(1000)
Tree Walking
Navigate the element hierarchy:
Feature: Tree walking
Scenario: Navigate element tree
* robot { window: '^My App' }
# Access parent
* locate('Child Element').parent.click('Close')
# Access children
* def pane = waitFor('//pane{Info}')
* pane.children[3].click()
# Available properties: parent, children, firstChild, lastChild, nextSibling, previousSibling
* def first = locate('Container').firstChild
* def next = first.nextSibling
Element Properties
Feature: Element properties
Scenario: Read element properties
* robot { window: '^My App' }
* def btn = locate('Submit')
# Common properties
* def name = btn.name
* def enabled = btn.enabled
* def visible = btn.present
# Windows-specific property by name or ID
* def isOffScreen = btn.property('IsOffscreen')
Robot API
Access global robot state and utilities:
Feature: Robot API
Scenario: Robot properties and methods
* robot { window: '^My App' }
# Desktop root element
* def allWindows = robot.root.locateAll('//window')
# Currently active element
* robot.active.highlight()
# Element with keyboard focus
* def focused = robot.focused
# Mouse position
* def pos = robot.location
* robot.location.highlight()
# Construct a location
* robot.location(885, 406).highlight()
# Construct a region for debugging
* def region = robot.region({ x: 100, y: 100, width: 100, height: 100 })
* region.debugCapture()
# List all windows
* print robot.allWindows
# Clipboard contents
* input(Key.CONTROL + 'a')
* input(Key.CONTROL + 'c')
* match robot.clipboard == 'expected text'
Screenshots
Feature: Screenshots
Scenario: Capture screenshots
* robot { window: '^My App' }
# Full desktop screenshot
* screenshot()
# Active window only
* screenshotActive()
# Specific element
* locate('//pane{Results}').screenshot()
Debugging
Feature: Debugging
Scenario: Visual debugging
* robot { window: '^My App', highlight: true }
# Highlight specific element
* highlight('Submit')
# Highlight all matching elements
* highlightAll('//button')
# Debug OCR results
* locate('//pane{Content}').debugExtract()
Cross-Platform
Windows
Windows provides the richest automation via UI Automation:
- Full XPath-like selector support
- Access to Automation IDs
- Control type and class name matching
- Use Inspect.exe to discover element properties
Feature: Windows automation
Scenario: Windows-specific patterns
* robot { window: 'Calculator', fork: 'calc' }
* click('#num7Button')
* click('//button{Plus}')
macOS
Requirements:
- Enable Accessibility permissions for Terminal/IDE in System Preferences
- Grant screen recording permissions if needed
Feature: macOS automation
Scenario: macOS patterns
* robot { window: '^Safari' }
# Use Key.META for Command key
* input(Key.META + 't')
Linux
Requirements:
- X11 display server (Wayland not fully supported)
- Set
DISPLAYenvironment variable if needed
export DISPLAY=:0
Advanced Patterns
Conditional Start
Handle applications that may or may not be running, with optional sign-in:
Feature: Conditional start
Scenario: Start app with sign-in if needed
* def mainWindowName = '^MyApp'
* robot {}
* def mainWindow = windowOptional(mainWindowName)
* if (mainWindow.present) { mainWindow.activate(); karate.abort() }
# App not running, start it
* karate.fork('C:/Program Files/MyApp/app.exe')
* retry(10).window('Sign In')
* waitFor('#userid').input('user@example.com')
* input('#password', 'Test@123')
* click('#submit-btn')
* retry(10).window(mainWindowName)
Mixing Web and Desktop
Handle native file dialogs in web applications:
Feature: Web and desktop integration
Scenario: Upload file via native dialog
# Web automation
* configure driver = { type: 'chrome' }
* driver 'https://example.com/upload'
* click('input[type="file"]')
# Switch to desktop for file dialog
* robot { window: 'Open' }
* def filePath = karate.toAbsolutePath('file:target/test-file.pdf')
* input(filePath)
* input(Key.ENTER)
# Back to web
* waitFor('.upload-complete')
Watch the native file upload demo for a complete walkthrough.
Utility Functions
Feature: Utility functions
Scenario: Path and command utilities
# Get OS-specific absolute path
* def absPath = karate.toAbsolutePath('file:target')
# Execute OS command and get output
* def result = karate.exec('dir')
# Conditional logic by OS
* if (karate.os.type == 'windows') karate.set('cmd', 'calc')
* if (karate.os.type == 'macosx') karate.set('cmd', 'open -a Calculator')
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Window not found | Wrong name or timing | Use ^ for contains, increase retryCount |
| Element not found | Wrong locator or timing | Use highlight() to debug, add waitFor() |
| Image not matching | Resolution or scaling | Recapture at target resolution, adjust strictness |
| OCR not working | Missing language files | Download tessdata for your language |
| Actions too fast | OS can't keep up | Set autoDelay: 40 in robot options |
| Permissions error (Mac) | Accessibility not enabled | Enable in System Preferences > Security |
| Display error (Linux) | DISPLAY not set | Export DISPLAY=:0 |
Resources
Demo Videos:
- Native File Upload - Clicking the native file upload button in a web page
- Windows UI Automation - Accessing native window controls
- iOS Emulator - Mobile emulator automation
Documentation:
- Example Project - Complete working Maven project
- Windows Install Guide - Setup and debugging
- Robot.java API - All available methods
Next Steps
- UI Testing - Web browser automation
- Performance Testing - Load test desktop workflows
- Calling Features - Create reusable automation flows