Xolani Dube
Back to Blog
AutomationC++C#.NETUI TestingWindows

AutoVision: Building a Multi-Modal UI Automation Framework

How I built a UI automation framework with 11 spy modes, 80+ scripting functions, and advanced techniques like memory scanning, ETW tracing, and computer vision for detecting any UI element.

Traditional UI automation tools hit walls. Legacy apps ignore accessibility APIs. Games render custom UIs. Some applications actively block automation. This led me to build AutoVision, a UI automation framework that combines 11 different detection techniques to find any element on screen.

The Problem: One Size Doesn't Fit All

Each UI automation approach has blind spots:

Technique Limitation
UI Automation (UIA) Modern apps only, no games
MSAA Legacy, incomplete on new apps
Win32 Window-level only, no internal controls
Image matching Breaks on resolution/theme changes

What if you could combine them all?

The Solution: Multi-Modal Detection

AutoVision implements 11 spy modes, each optimized for different scenarios:

Core Spy Modes

Mode Technology Best For
UIA UI Automation API Modern WPF/UWP apps
MSAA Active Accessibility Legacy Win32 apps
Win32 Window messages Native controls
JAB Java Access Bridge Java applications

Advanced Spy Modes

Mode Technology Best For
WM_HOOK SetWindowsHookEx Real-time message interception
MEM_SCAN ReadProcessMemory Bypassing automation blockers
HID_DEVICE Raw Input API Hardware-level input capture
RENDER_HOOK DirectX/GDI hooks Games, custom renderers
KERNEL_TRACE ETW tracing Kernel-level visibility
VISION OpenCV + OCR Visual element detection
FUSION All modes combined Intelligent auto-selection

Architecture

Native Core (C++)

Performance-critical element detection runs in native C++:

class AutomationElement {
public:
    ActionResult Initialize();
    std::shared_ptr<AutomationElement> FindElementByPoint(int x, int y);
    std::vector<std::shared_ptr<AutomationElement>> GetChildren();
    ElementProperties GetProperties() const;
    ActionResult PerformAction(ActionType action);

private:
    IUIAutomationElement* m_uiaElement;
    IAccessible* m_msaaElement;
    HWND m_hwnd;
};

Managed Interop (C#)

Business logic and scripting in C#:

public class NativeAutomationElement : IDisposable {
    public NativeElementProperties GetProperties();
    public List<NativeAutomationElement> GetChildren();
    public void Click();
    public void SetText(string value);
    public string GetText();
}

Scripting Engine

80+ built-in functions for automation scripts:

// Math: Sum, Average, Power, Sqrt, Sin, Cos...
// String: Trim, Replace, Substring, RegexMatch...
// DateTime: Today, AddDays, DaysBetween, FormatDate...
// Collections: Count, First, Last, Reverse...

// Example script
Set totalPrice = Sum(data.prices)
Set formattedDate = FormatDate(Today(), "yyyy-MM-dd")
Set isValid = RegexMatch(email, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+$")

Key Features

1. Intelligent Fallback

When the primary mode fails, AutoVision automatically tries alternatives:

float CalculateConfidence(ModeResult result) {
    float score = 0.0f;

    // Has valid properties
    if (!result.Element.Name.IsEmpty()) score += 0.3f;
    if (result.Element.BoundingRectangle.IsValid()) score += 0.2f;

    // Response time (faster = better)
    if (result.ResponseTime < 50) score += 0.2f;

    // Mode-specific bonuses
    if (result.Mode == SpyMode.UIA) score += 0.2f;

    return Math.Min(score, 1.0f);
}

2. Real-Time Element Highlighting

Color-coded highlights show which spy mode found each element:

Color Mode
Red UIA
Green MSAA
Blue Win32
Orange WM_HOOK
Gold Vision
Rainbow Fusion

3. Session Recording & Replay

Record automation sessions with intelligent element re-finding:

  • Multiple locator strategies per element (AutomationId, XPath, visual)
  • Handles UI changes between recordings
  • Exports to C# or Python test code

4. Memory-Safe Native Code

RAII patterns prevent memory leaks in long-running sessions:

// Smart pointers for automatic cleanup
std::shared_ptr<AutomationElement> GetParent() {
    return std::make_shared<AutomationElement>(m_parentHandle);
}

// RAII wrappers for COM interfaces
class ComPtr {
    ~ComPtr() { if (m_ptr) m_ptr->Release(); }
};

Performance Metrics

Metric Value
Element detection < 100ms
Script parsing 24/24 tests passing
Fuzzing inputs tested 1000+
Crash rate 0%
Standard library functions 80+

Real-World Applications

Test Automation

  • Cross-platform UI testing
  • Legacy application validation
  • Accessibility compliance checking

RPA Integration

  • Element detection for robotic process automation
  • Visual verification of automation steps
  • Handling non-standard UI controls

Quality Assurance

  • Visual regression testing
  • Performance profiling
  • Automated accessibility audits

What I Learned

1. Native Code Still Matters

Performance-critical paths benefit enormously from C++. Element detection went from 200ms to 20ms by moving to native code.

2. Fallback Chains Beat Single Solutions

No single API covers all scenarios. The fusion approach provides 99%+ coverage.

3. Scripting Enables Adoption

A powerful scripting engine lets users customize without rebuilding. The 80+ functions cover most business logic needs.

4. Memory Safety Requires Discipline

COM interop and native handles need careful lifecycle management. RAII patterns made this manageable.

Future Directions

  • Machine learning for fusion mode confidence scoring
  • Cross-platform support via libui or Qt
  • Cloud integration for distributed test execution
  • Visual debugging with element tree visualization

UI automation shouldn't require knowing which API works for each application. AutoVision makes finding elements automatic.