EXTENSIONS

Desktop Automation

Automate native desktop applications across Windows, Mac, and Linux using karate-robot. Combine native mouse and keyboard events with Windows UI Automation, image recognition, and OCR text detection in a single test framework that integrates with API and web testing.

On this page:

Quick Start - Maven, Gradle, standalone JAR setup
robot Keyword - Configuration and options
Window Management - Finding and switching windows
Locator Strategies - Windows UI, image, OCR locators
Mouse Actions - Click, move, drag operations
Keyboard Input - Text and key combinations
Element API - Locate, wait, tree walking
Robot API - Clipboard, screenshots, debugging
Cross-Platform - Windows, macOS, Linux specifics
Advanced Patterns - Conditional start, web + desktop
Troubleshooting - Common issues and solutions

Quick Start

Maven Setup

Add the karate-robot dependency:

pom.xml
<dependency>
    <groupId>io.karatelabs</groupId>
    <artifactId>karate-robot</artifactId>
    <version>${karate.version}</version>
    <scope>test</scope>
</dependency>

Large Dependencies

The karate-robot module includes JavaCPP presets for OpenCV and Tesseract. Follow JavaCPP guidance to reduce downloads for your specific OS.

Gradle Setup

build.gradle
testImplementation "io.karatelabs:karate-robot:${karateVersion}"

Standalone JAR

For non-developers or quick scripts, use the standalone JAR with VS Code. The karate-robot for Windows is approximately 150 MB and downloaded separately. See the Windows Install Guide for setup instructions.

Demo Video

Watch the debugging demo to see VS Code integration in action.

robot Keyword

The robot keyword activates desktop automation. Karate Robot only initializes when you use this keyword and the karate-robot dependency is present.

Basic Usage

Gherkin
Feature: Basic robot usage

Scenario: Target a window
  # Find window with exact name
  * robot { window: 'Calculator' }

  # Find window where name contains 'Chrome'
  * robot { window: '^Chrome' }

  # Find window with regex match
  * robot { window: '~MyApp|MYAPP' }

Configuration Options

Option	Default	Description
`window`	-	Window name to focus. Use `^` prefix for contains, `~` for regex
`fork`	-	OS command to launch application (string, array, or JSON)
`autoClose`	`true`	Close window when test ends if `fork` was used
`attach`	`true`	Skip `fork` if window already exists
`basePath`	`null`	Base path for image locators (e.g., `classpath:images`)
`highlight`	`false`	Highlight matched elements visually
`highlightDuration`	`3000`	Highlight duration in milliseconds
`retryCount`	`3`	Retry attempts for finding window after fork
`retryInterval`	`3000`	Milliseconds between retries
`autoDelay`	`0`	Delay after native actions (ms), use if OS is too slow
`tessData`	`tessdata`	Path to Tesseract OCR data files
`tessLang`	`eng`	Default OCR language

configure robot Pattern

Set global defaults in karate-config.js and override per test:

Gherkin
Feature: Configure robot pattern

Scenario: Use global config with local override
  * configure robot = { highlight: true, highlightDuration: 500 }
  * robot { window: '^My App' }
  # Or shorthand when only window name needed
  * robot '^My App'

Window Management

Finding Windows

Windows can be matched by exact name, contains pattern, or regex:

Gherkin
Feature: Window matching

Scenario: Different matching strategies
  # Exact match
  * robot { window: 'Calculator' }

  # Contains match (window title contains 'Chrome')
  * robot { window: '^Chrome' }

  # Regex match (matches 'MyApp' or 'MYAPP')
  * robot { window: '~MyApp|MYAPP' }

Window Methods

Method	Description
`window(name)`	Activate window by name (supports `^` and `~` prefixes)
`windowExists(name)`	Returns boolean, does not activate
`windowOptional(name)`	Returns Window object or "fake" if not found
`waitForWindowOptional(name)`	Like `windowOptional` but with retry

Gherkin
Feature: Window methods

Scenario: Conditional window handling
  * robot { highlight: true }

  # Check if window exists without switching
  * def exists = windowExists('My Dialog')

  # Handle optional modal dialog
  * windowOptional('Tips on Startup').locate('Close').click()

  # Wait for window that may appear
  * retry(3).waitForWindowOptional('^Loading')

Launching Applications

Use fork to launch applications, with automatic detection of existing windows:

Gherkin
Feature: Launch application

Scenario: Start app if not running
  # fork only executes if window not found (attach: true is default)
  * robot { window: 'Calculator', fork: 'calc' }

  # With full path and extended retry for slow apps
  * robot { window: '^MyApp', fork: 'C:/Program Files/MyApp/app.exe', retryCount: 10 }

karate.fork()

For more control, use karate.fork() directly:

Gherkin
Feature: Conditional fork

Scenario: Start app only if needed
  * robot { highlight: true }
  * if (!windowExists('^Main Window')) karate.fork('C:/MyApp/app.exe')
  * retry(10).window('^Main Window')

Locator Strategies

Karate Robot supports three locator strategies: Windows UI Automation (Windows only), image matching, and OCR text recognition.

Windows Locators

Windows UI Automation provides precise element access using XPath-like syntax:

Locator	Description
`'Click Me'`	First element with exact name "Click Me"
`'^Click'`	First element where name contains "Click"
`'~Click\|Submit'`	First element matching regex
`'#AutomationId'`	Element by Automation ID
`'//button{Click Me}'`	Button with exact name
`'//button{^Click}'`	Button where name contains "Click"
`'/pane[2]/button'`	Absolute path: second pane, first button
`'//pane/*/button'`	Wildcard depth matching
`'//button.TButton{^Click}'`	Button with class name "TButton"
`'/root//window'`	Search from desktop root

Gherkin
Feature: Windows locators

Scenario: Use Windows UI Automation
  * robot { window: 'Calculator', fork: 'calc' }
  * click('Clear')
  * click('One')
  * click('Plus')
  * click('Two')
  * click('Equals')
  * match locate('#CalculatorResults').name == 'Display is 3'
  * screenshot()
  * click('Close Calculator')

Use Inspect.exe to discover element properties for automation.

Image Locators

Match elements by PNG image. Images must be PNG format with .png extension:

Gherkin
Feature: Image locators

Scenario: Click by image
  * robot { window: '^Chrome', basePath: 'classpath:images' }
  * click('submit-button.png')
  * waitFor('success-message.png')

Strictness factor: Prefix with number and colon to adjust matching sensitivity:

Gherkin
Feature: Image strictness

Scenario: Adjust image matching
  # Default strictness (10)
  * click('button.png')

  # Strict matching (1 = most strict)
  * click('1:button.png')

  # Lenient matching (values > 10)
  * click('15:button.png')

Image Best Practices

Capture images at the same resolution as the target display
Use the debugger with highlight() to troubleshoot matching
Store images in a dedicated directory and set basePath

OCR Locators

Find elements by visible text using Tesseract OCR. Prefix with {lang} pattern:

Gherkin
Feature: OCR locators

Scenario: Click by text
  # English text
  * click('{eng}Submit')

  # Light text on dark background (negative)
  * click('{-eng}Dark Mode')

  # Use default tessLang
  * click('{}Click Here')

Setup: Download language data files from Tesseract and place in tessdata folder. Choose between tessdata, tessdata-fast, or tessdata-best based on quality vs speed needs.

Text extraction:

Gherkin
Feature: OCR extraction

Scenario: Extract text from element
  * robot { window: '^My App' }
  * def text = locate('//pane{Results}').extract('eng')
  * match text contains 'Search Results'

  # Extract from entire screen
  * def screenText = robot.root.extract()

  # Debug: highlight all found words
  * locate('//pane{Results}').debugExtract()

Mouse Actions

click()

Click elements by locator, coordinates, or with specific button:

Gherkin
Feature: Click actions

Scenario: Various click operations
  * robot { window: '^My App' }

  # Click by locator
  * click('Submit')

  # Click at coordinates (0,0 is top-left of screen)
  * click(100, 200)

  # Click with button: 1=left, 2=middle, 3=right
  * click('Options', 3)

  # Click at offset within element
  * locate('Taxpayer').click(20, 40)

Other Mouse Actions

Gherkin
Feature: Mouse actions

Scenario: Mouse operations
  * robot { window: '^My App' }

  * doubleClick('file.txt')
  * rightClick('context-menu-trigger')

  # Move to coordinates or image
  * move(500, 300)
  * move('target.png')

  # Drag and drop
  * move('drag-source.png')
  * press()
  * move('drop-target.png')
  * release()

Keyboard Input

Basic Input

Gherkin
Feature: Keyboard input

Scenario: Text and keys
  * robot { window: '^Notepad' }
  * input('Hello World')
  * input(Key.ENTER)
  * input('Second line')

Key Combinations

Modifier keys (Key.CTRL, Key.ALT, Key.META, Key.SHIFT) are automatically released:

Gherkin
Feature: Key combinations

Scenario: Keyboard shortcuts
  * robot { window: '^Chrome' }

  # Open new tab (Mac: Key.META, Windows: Key.CONTROL)
  * input(Key.META + 't')

  # Select all and copy
  * input(Key.CONTROL + 'a')
  * input(Key.CONTROL + 'c')

  # Multiple keys as array
  * input([Key.DOWN, Key.RIGHT, Key.ENTER])

  # Array with delay between keys (ms)
  * input([Key.DOWN, Key.DOWN, Key.ENTER], 100)

  # Slow typing (delay per character)
  * input('type slowly', 50)

Available keys: Key.ENTER, Key.TAB, Key.ESCAPE, Key.BACKSPACE, Key.DELETE, Key.UP, Key.DOWN, Key.LEFT, Key.RIGHT, Key.HOME, Key.END, Key.PAGE_UP, Key.PAGE_DOWN, Key.F1 through Key.F12, and more.

Element API

Finding Elements

Gherkin
Feature: Element finding

Scenario: Locate elements
  * robot { window: '^My App' }

  # locate() fails if not found
  * def btn = locate('Submit')
  * btn.click()

  # optional() returns "fake" element if not found
  * optional('//pane{Warning}').locate('Close').click()

  # exists() returns boolean
  * assert exists('//pane{Main}')

  # locateAll() returns array
  * def buttons = locateAll('//button')
  * buttons[1].click()

Wait Methods

Gherkin
Feature: Wait methods

Scenario: Wait for elements
  * robot { window: '^My App' }

  # Wait for element to appear
  * waitFor('Loading Complete').click()

  # Wait but don't fail if not found
  * retry(2).waitForOptional('Optional Dialog')

  # Wait until condition is true
  * def checkEnabled = function(){ return optional('Submit').enabled }
  * waitUntil(checkEnabled)

  # Simple delay (avoid if possible)
  * delay(1000)

Tree Walking

Navigate the element hierarchy:

Gherkin
Feature: Tree walking

Scenario: Navigate element tree
  * robot { window: '^My App' }

  # Access parent
  * locate('Child Element').parent.click('Close')

  # Access children
  * def pane = waitFor('//pane{Info}')
  * pane.children[3].click()

  # Available properties: parent, children, firstChild, lastChild, nextSibling, previousSibling
  * def first = locate('Container').firstChild
  * def next = first.nextSibling

Element Properties

Gherkin
Feature: Element properties

Scenario: Read element properties
  * robot { window: '^My App' }
  * def btn = locate('Submit')

  # Common properties
  * def name = btn.name
  * def enabled = btn.enabled
  * def visible = btn.present

  # Windows-specific property by name or ID
  * def isOffScreen = btn.property('IsOffscreen')

Robot API

Access global robot state and utilities:

Gherkin
Feature: Robot API

Scenario: Robot properties and methods
  * robot { window: '^My App' }

  # Desktop root element
  * def allWindows = robot.root.locateAll('//window')

  # Currently active element
  * robot.active.highlight()

  # Element with keyboard focus
  * def focused = robot.focused

  # Mouse position
  * def pos = robot.location
  * robot.location.highlight()

  # Construct a location
  * robot.location(885, 406).highlight()

  # Construct a region for debugging
  * def region = robot.region({ x: 100, y: 100, width: 100, height: 100 })
  * region.debugCapture()

  # List all windows
  * print robot.allWindows

  # Clipboard contents
  * input(Key.CONTROL + 'a')
  * input(Key.CONTROL + 'c')
  * match robot.clipboard == 'expected text'

Screenshots

Gherkin
Feature: Screenshots

Scenario: Capture screenshots
  * robot { window: '^My App' }

  # Full desktop screenshot
  * screenshot()

  # Active window only
  * screenshotActive()

  # Specific element
  * locate('//pane{Results}').screenshot()

Debugging

Gherkin
Feature: Debugging

Scenario: Visual debugging
  * robot { window: '^My App', highlight: true }

  # Highlight specific element
  * highlight('Submit')

  # Highlight all matching elements
  * highlightAll('//button')

  # Debug OCR results
  * locate('//pane{Content}').debugExtract()

Cross-Platform

Windows

Windows provides the richest automation via UI Automation:

Full XPath-like selector support
Access to Automation IDs
Control type and class name matching
Use Inspect.exe to discover element properties

Gherkin
Feature: Windows automation

Scenario: Windows-specific patterns
  * robot { window: 'Calculator', fork: 'calc' }
  * click('#num7Button')
  * click('//button{Plus}')

macOS

Requirements:

Enable Accessibility permissions for Terminal/IDE in System Preferences
Grant screen recording permissions if needed

Gherkin
Feature: macOS automation

Scenario: macOS patterns
  * robot { window: '^Safari' }
  # Use Key.META for Command key
  * input(Key.META + 't')

Linux

Requirements:

X11 display server (Wayland not fully supported)
Set DISPLAY environment variable if needed

Shell
export DISPLAY=:0

Advanced Patterns

Conditional Start

Handle applications that may or may not be running, with optional sign-in:

Gherkin
Feature: Conditional start

Scenario: Start app with sign-in if needed
  * def mainWindowName = '^MyApp'
  * robot {}
  * def mainWindow = windowOptional(mainWindowName)
  * if (mainWindow.present) { mainWindow.activate(); karate.abort() }

  # App not running, start it
  * karate.fork('C:/Program Files/MyApp/app.exe')
  * retry(10).window('Sign In')
  * waitFor('#userid').input('user@example.com')
  * input('#password', 'Test@123')
  * click('#submit-btn')
  * retry(10).window(mainWindowName)

Mixing Web and Desktop

Handle native file dialogs in web applications:

Gherkin
Feature: Web and desktop integration

Scenario: Upload file via native dialog
  # Web automation
  * configure driver = { type: 'chrome' }
  * driver 'https://example.com/upload'
  * click('input[type="file"]')

  # Switch to desktop for file dialog
  * robot { window: 'Open' }
  * def filePath = karate.toAbsolutePath('file:target/test-file.pdf')
  * input(filePath)
  * input(Key.ENTER)

  # Back to web
  * waitFor('.upload-complete')

Demo Video

Watch the native file upload demo for a complete walkthrough.

Utility Functions

Gherkin
Feature: Utility functions

Scenario: Path and command utilities
  # Get OS-specific absolute path
  * def absPath = karate.toAbsolutePath('file:target')

  # Execute OS command and get output
  * def result = karate.exec('dir')

  # Conditional logic by OS
  * if (karate.os.type == 'windows') karate.set('cmd', 'calc')
  * if (karate.os.type == 'macosx') karate.set('cmd', 'open -a Calculator')

Troubleshooting

Problem	Cause	Solution
Window not found	Wrong name or timing	Use `^` for contains, increase `retryCount`
Element not found	Wrong locator or timing	Use `highlight()` to debug, add `waitFor()`
Image not matching	Resolution or scaling	Recapture at target resolution, adjust strictness
OCR not working	Missing language files	Download tessdata for your language
Actions too fast	OS can't keep up	Set `autoDelay: 40` in robot options
Permissions error (Mac)	Accessibility not enabled	Enable in System Preferences > Security
Display error (Linux)	DISPLAY not set	Export `DISPLAY=:0`

Resources

Demo Videos:

Native File Upload - Clicking the native file upload button in a web page
Windows UI Automation - Accessing native window controls
iOS Emulator - Mobile emulator automation

Documentation:

Example Project - Complete working Maven project
Windows Install Guide - Setup and debugging
Robot.java API - All available methods

Next Steps

UI Testing - Web browser automation
Performance Testing - Load test desktop workflows
Calling Features - Create reusable automation flows

Quick Start​

Maven Setup​

Gradle Setup​

Standalone JAR​

robot Keyword​

Basic Usage​

Configuration Options​

configure robot Pattern​

Window Management​

Finding Windows​

Window Methods​

Launching Applications​

karate.fork()​

Locator Strategies​

Windows Locators​

Image Locators​

OCR Locators​

Mouse Actions​

click()​

Other Mouse Actions​

Keyboard Input​

Basic Input​

Key Combinations​

Element API​

Finding Elements​

Wait Methods​

Tree Walking​

Element Properties​

Robot API​

Screenshots​

Debugging​

Cross-Platform​

Windows​

macOS​

Linux​

Advanced Patterns​

Conditional Start​

Mixing Web and Desktop​

Utility Functions​

Troubleshooting​

Resources​

Next Steps​