Skip to main content

EXTENSIONS

Desktop Automation

Automate native desktop applications across Windows, Mac, and Linux using karate-robot. Combine native mouse and keyboard events with Windows UI Automation, image recognition, and OCR text detection in a single test framework that integrates with API and web testing.

On this page:

Quick Start

Maven Setup

Add the karate-robot dependency:

pom.xml
<dependency>
<groupId>io.karatelabs</groupId>
<artifactId>karate-robot</artifactId>
<version>${karate.version}</version>
<scope>test</scope>
</dependency>
Large Dependencies

The karate-robot module includes JavaCPP presets for OpenCV and Tesseract. Follow JavaCPP guidance to reduce downloads for your specific OS.

Gradle Setup

build.gradle
testImplementation "io.karatelabs:karate-robot:${karateVersion}"

Standalone JAR

For non-developers or quick scripts, use the standalone JAR with VS Code. The karate-robot for Windows is approximately 150 MB and downloaded separately. See the Windows Install Guide for setup instructions.

Demo Video

Watch the debugging demo to see VS Code integration in action.

robot Keyword

The robot keyword activates desktop automation. Karate Robot only initializes when you use this keyword and the karate-robot dependency is present.

Basic Usage

Gherkin
Feature: Basic robot usage

Scenario: Target a window
# Find window with exact name
* robot { window: 'Calculator' }

# Find window where name contains 'Chrome'
* robot { window: '^Chrome' }

# Find window with regex match
* robot { window: '~MyApp|MYAPP' }

Configuration Options

OptionDefaultDescription
window-Window name to focus. Use ^ prefix for contains, ~ for regex
fork-OS command to launch application (string, array, or JSON)
autoClosetrueClose window when test ends if fork was used
attachtrueSkip fork if window already exists
basePathnullBase path for image locators (e.g., classpath:images)
highlightfalseHighlight matched elements visually
highlightDuration3000Highlight duration in milliseconds
retryCount3Retry attempts for finding window after fork
retryInterval3000Milliseconds between retries
autoDelay0Delay after native actions (ms), use if OS is too slow
tessDatatessdataPath to Tesseract OCR data files
tessLangengDefault OCR language

configure robot Pattern

Set global defaults in karate-config.js and override per test:

Gherkin
Feature: Configure robot pattern

Scenario: Use global config with local override
* configure robot = { highlight: true, highlightDuration: 500 }
* robot { window: '^My App' }
# Or shorthand when only window name needed
* robot '^My App'

Window Management

Finding Windows

Windows can be matched by exact name, contains pattern, or regex:

Gherkin
Feature: Window matching

Scenario: Different matching strategies
# Exact match
* robot { window: 'Calculator' }

# Contains match (window title contains 'Chrome')
* robot { window: '^Chrome' }

# Regex match (matches 'MyApp' or 'MYAPP')
* robot { window: '~MyApp|MYAPP' }

Window Methods

MethodDescription
window(name)Activate window by name (supports ^ and ~ prefixes)
windowExists(name)Returns boolean, does not activate
windowOptional(name)Returns Window object or "fake" if not found
waitForWindowOptional(name)Like windowOptional but with retry
Gherkin
Feature: Window methods

Scenario: Conditional window handling
* robot { highlight: true }

# Check if window exists without switching
* def exists = windowExists('My Dialog')

# Handle optional modal dialog
* windowOptional('Tips on Startup').locate('Close').click()

# Wait for window that may appear
* retry(3).waitForWindowOptional('^Loading')

Launching Applications

Use fork to launch applications, with automatic detection of existing windows:

Gherkin
Feature: Launch application

Scenario: Start app if not running
# fork only executes if window not found (attach: true is default)
* robot { window: 'Calculator', fork: 'calc' }

# With full path and extended retry for slow apps
* robot { window: '^MyApp', fork: 'C:/Program Files/MyApp/app.exe', retryCount: 10 }

karate.fork()

For more control, use karate.fork() directly:

Gherkin
Feature: Conditional fork

Scenario: Start app only if needed
* robot { highlight: true }
* if (!windowExists('^Main Window')) karate.fork('C:/MyApp/app.exe')
* retry(10).window('^Main Window')

Locator Strategies

Karate Robot supports three locator strategies: Windows UI Automation (Windows only), image matching, and OCR text recognition.

Windows Locators

Windows UI Automation provides precise element access using XPath-like syntax:

LocatorDescription
'Click Me'First element with exact name "Click Me"
'^Click'First element where name contains "Click"
'~Click|Submit'First element matching regex
'#AutomationId'Element by Automation ID
'//button{Click Me}'Button with exact name
'//button{^Click}'Button where name contains "Click"
'/pane[2]/button'Absolute path: second pane, first button
'//pane/*/button'Wildcard depth matching
'//button.TButton{^Click}'Button with class name "TButton"
'/root//window'Search from desktop root
Gherkin
Feature: Windows locators

Scenario: Use Windows UI Automation
* robot { window: 'Calculator', fork: 'calc' }
* click('Clear')
* click('One')
* click('Plus')
* click('Two')
* click('Equals')
* match locate('#CalculatorResults').name == 'Display is 3'
* screenshot()
* click('Close Calculator')

Use Inspect.exe to discover element properties for automation.

Image Locators

Match elements by PNG image. Images must be PNG format with .png extension:

Gherkin
Feature: Image locators

Scenario: Click by image
* robot { window: '^Chrome', basePath: 'classpath:images' }
* click('submit-button.png')
* waitFor('success-message.png')

Strictness factor: Prefix with number and colon to adjust matching sensitivity:

Gherkin
Feature: Image strictness

Scenario: Adjust image matching
# Default strictness (10)
* click('button.png')

# Strict matching (1 = most strict)
* click('1:button.png')

# Lenient matching (values > 10)
* click('15:button.png')
Image Best Practices
  • Capture images at the same resolution as the target display
  • Use the debugger with highlight() to troubleshoot matching
  • Store images in a dedicated directory and set basePath

OCR Locators

Find elements by visible text using Tesseract OCR. Prefix with {lang} pattern:

Gherkin
Feature: OCR locators

Scenario: Click by text
# English text
* click('{eng}Submit')

# Light text on dark background (negative)
* click('{-eng}Dark Mode')

# Use default tessLang
* click('{}Click Here')

Setup: Download language data files from Tesseract and place in tessdata folder. Choose between tessdata, tessdata-fast, or tessdata-best based on quality vs speed needs.

Text extraction:

Gherkin
Feature: OCR extraction

Scenario: Extract text from element
* robot { window: '^My App' }
* def text = locate('//pane{Results}').extract('eng')
* match text contains 'Search Results'

# Extract from entire screen
* def screenText = robot.root.extract()

# Debug: highlight all found words
* locate('//pane{Results}').debugExtract()

Mouse Actions

click()

Click elements by locator, coordinates, or with specific button:

Gherkin
Feature: Click actions

Scenario: Various click operations
* robot { window: '^My App' }

# Click by locator
* click('Submit')

# Click at coordinates (0,0 is top-left of screen)
* click(100, 200)

# Click with button: 1=left, 2=middle, 3=right
* click('Options', 3)

# Click at offset within element
* locate('Taxpayer').click(20, 40)

Other Mouse Actions

Gherkin
Feature: Mouse actions

Scenario: Mouse operations
* robot { window: '^My App' }

* doubleClick('file.txt')
* rightClick('context-menu-trigger')

# Move to coordinates or image
* move(500, 300)
* move('target.png')

# Drag and drop
* move('drag-source.png')
* press()
* move('drop-target.png')
* release()

Keyboard Input

Basic Input

Gherkin
Feature: Keyboard input

Scenario: Text and keys
* robot { window: '^Notepad' }
* input('Hello World')
* input(Key.ENTER)
* input('Second line')

Key Combinations

Modifier keys (Key.CTRL, Key.ALT, Key.META, Key.SHIFT) are automatically released:

Gherkin
Feature: Key combinations

Scenario: Keyboard shortcuts
* robot { window: '^Chrome' }

# Open new tab (Mac: Key.META, Windows: Key.CONTROL)
* input(Key.META + 't')

# Select all and copy
* input(Key.CONTROL + 'a')
* input(Key.CONTROL + 'c')

# Multiple keys as array
* input([Key.DOWN, Key.RIGHT, Key.ENTER])

# Array with delay between keys (ms)
* input([Key.DOWN, Key.DOWN, Key.ENTER], 100)

# Slow typing (delay per character)
* input('type slowly', 50)

Available keys: Key.ENTER, Key.TAB, Key.ESCAPE, Key.BACKSPACE, Key.DELETE, Key.UP, Key.DOWN, Key.LEFT, Key.RIGHT, Key.HOME, Key.END, Key.PAGE_UP, Key.PAGE_DOWN, Key.F1 through Key.F12, and more.

Element API

Finding Elements

Gherkin
Feature: Element finding

Scenario: Locate elements
* robot { window: '^My App' }

# locate() fails if not found
* def btn = locate('Submit')
* btn.click()

# optional() returns "fake" element if not found
* optional('//pane{Warning}').locate('Close').click()

# exists() returns boolean
* assert exists('//pane{Main}')

# locateAll() returns array
* def buttons = locateAll('//button')
* buttons[1].click()

Wait Methods

Gherkin
Feature: Wait methods

Scenario: Wait for elements
* robot { window: '^My App' }

# Wait for element to appear
* waitFor('Loading Complete').click()

# Wait but don't fail if not found
* retry(2).waitForOptional('Optional Dialog')

# Wait until condition is true
* def checkEnabled = function(){ return optional('Submit').enabled }
* waitUntil(checkEnabled)

# Simple delay (avoid if possible)
* delay(1000)

Tree Walking

Navigate the element hierarchy:

Gherkin
Feature: Tree walking

Scenario: Navigate element tree
* robot { window: '^My App' }

# Access parent
* locate('Child Element').parent.click('Close')

# Access children
* def pane = waitFor('//pane{Info}')
* pane.children[3].click()

# Available properties: parent, children, firstChild, lastChild, nextSibling, previousSibling
* def first = locate('Container').firstChild
* def next = first.nextSibling

Element Properties

Gherkin
Feature: Element properties

Scenario: Read element properties
* robot { window: '^My App' }
* def btn = locate('Submit')

# Common properties
* def name = btn.name
* def enabled = btn.enabled
* def visible = btn.present

# Windows-specific property by name or ID
* def isOffScreen = btn.property('IsOffscreen')

Robot API

Access global robot state and utilities:

Gherkin
Feature: Robot API

Scenario: Robot properties and methods
* robot { window: '^My App' }

# Desktop root element
* def allWindows = robot.root.locateAll('//window')

# Currently active element
* robot.active.highlight()

# Element with keyboard focus
* def focused = robot.focused

# Mouse position
* def pos = robot.location
* robot.location.highlight()

# Construct a location
* robot.location(885, 406).highlight()

# Construct a region for debugging
* def region = robot.region({ x: 100, y: 100, width: 100, height: 100 })
* region.debugCapture()

# List all windows
* print robot.allWindows

# Clipboard contents
* input(Key.CONTROL + 'a')
* input(Key.CONTROL + 'c')
* match robot.clipboard == 'expected text'

Screenshots

Gherkin
Feature: Screenshots

Scenario: Capture screenshots
* robot { window: '^My App' }

# Full desktop screenshot
* screenshot()

# Active window only
* screenshotActive()

# Specific element
* locate('//pane{Results}').screenshot()

Debugging

Gherkin
Feature: Debugging

Scenario: Visual debugging
* robot { window: '^My App', highlight: true }

# Highlight specific element
* highlight('Submit')

# Highlight all matching elements
* highlightAll('//button')

# Debug OCR results
* locate('//pane{Content}').debugExtract()

Cross-Platform

Windows

Windows provides the richest automation via UI Automation:

  • Full XPath-like selector support
  • Access to Automation IDs
  • Control type and class name matching
  • Use Inspect.exe to discover element properties
Gherkin
Feature: Windows automation

Scenario: Windows-specific patterns
* robot { window: 'Calculator', fork: 'calc' }
* click('#num7Button')
* click('//button{Plus}')

macOS

Requirements:

  • Enable Accessibility permissions for Terminal/IDE in System Preferences
  • Grant screen recording permissions if needed
Gherkin
Feature: macOS automation

Scenario: macOS patterns
* robot { window: '^Safari' }
# Use Key.META for Command key
* input(Key.META + 't')

Linux

Requirements:

  • X11 display server (Wayland not fully supported)
  • Set DISPLAY environment variable if needed
Shell
export DISPLAY=:0

Advanced Patterns

Conditional Start

Handle applications that may or may not be running, with optional sign-in:

Gherkin
Feature: Conditional start

Scenario: Start app with sign-in if needed
* def mainWindowName = '^MyApp'
* robot {}
* def mainWindow = windowOptional(mainWindowName)
* if (mainWindow.present) { mainWindow.activate(); karate.abort() }

# App not running, start it
* karate.fork('C:/Program Files/MyApp/app.exe')
* retry(10).window('Sign In')
* waitFor('#userid').input('user@example.com')
* input('#password', 'Test@123')
* click('#submit-btn')
* retry(10).window(mainWindowName)

Mixing Web and Desktop

Handle native file dialogs in web applications:

Gherkin
Feature: Web and desktop integration

Scenario: Upload file via native dialog
# Web automation
* configure driver = { type: 'chrome' }
* driver 'https://example.com/upload'
* click('input[type="file"]')

# Switch to desktop for file dialog
* robot { window: 'Open' }
* def filePath = karate.toAbsolutePath('file:target/test-file.pdf')
* input(filePath)
* input(Key.ENTER)

# Back to web
* waitFor('.upload-complete')
Demo Video

Watch the native file upload demo for a complete walkthrough.

Utility Functions

Gherkin
Feature: Utility functions

Scenario: Path and command utilities
# Get OS-specific absolute path
* def absPath = karate.toAbsolutePath('file:target')

# Execute OS command and get output
* def result = karate.exec('dir')

# Conditional logic by OS
* if (karate.os.type == 'windows') karate.set('cmd', 'calc')
* if (karate.os.type == 'macosx') karate.set('cmd', 'open -a Calculator')

Troubleshooting

ProblemCauseSolution
Window not foundWrong name or timingUse ^ for contains, increase retryCount
Element not foundWrong locator or timingUse highlight() to debug, add waitFor()
Image not matchingResolution or scalingRecapture at target resolution, adjust strictness
OCR not workingMissing language filesDownload tessdata for your language
Actions too fastOS can't keep upSet autoDelay: 40 in robot options
Permissions error (Mac)Accessibility not enabledEnable in System Preferences > Security
Display error (Linux)DISPLAY not setExport DISPLAY=:0

Resources

Demo Videos:

Documentation:

Next Steps