Skip to main content

Command Palette

Search for a command to run...

RTS – AST Based Test Selection: A Deep Dive into Building Production-Grade Regression Test Selection

Building a language-agnostic regression test selector using Clang LibTooling for C++ and Python's AST module

Updated
8 min read
RTS – AST Based Test Selection: A Deep Dive into Building Production-Grade Regression Test Selection
H

Senior SDET Engineer experienced in Testing, DevOps, Automation, and Tooling. I build and deploy internal tools and testing frameworks, maintain infrastructure, Implement and maintain CI/CD and drive scalable deployments. I also work on defining test strategies, developing validation processes, and driving coverage to ensure software quality, performance, and reliability.


Introduction

At scale, running your entire test suite after every commit is a luxury you can't afford. With 40,000+ tests averaging 4 minutes each, that's 2,700+ machine-hours per run. But skipping tests blindly risks missing regressions.

The question isn't whether to select tests, it's how to do it accurately. Most approaches treat code as text: diff the files, grep for changes, hope for the best. We took a different path: treat code as structure. Parse it. Understand it. Use the same tools compilers use.

This article is about the ideology behind AST-based regression test selection. Why we chose this approach, what it enables, and how thinking like a compiler changes the game.


The Core Problem: What Actually Changed?

When you modify a source file, Git tells you which lines changed. But lines aren't what tests exercise. Methods & Functions are.

Consider this scenario:

// Before: line 42-87
void processData(Input& data) {
    validate(data);
    transform(data);
}

// After: added a log statement on line 45
void processData(Input& data) {
    validate(data);
    log("Processing started");  // NEW
    transform(data);
}

Git says "line 45 changed." But what you need to know is: processData changed. Then you can ask: "Which tests call processData?"

This is the fundamental shift: from lines to methods and functions.


The Ideology: Code as Structure

Most RTS tools treat source code as text. They diff files, look for changed lines, maybe grep for function names. This works until it doesn't:

  • Overloaded functions: process(int) and process(double) have the same name

  • Namespaces: utils::process() vs internal::process()

  • Templates: Which instantiation changed?

  • Macros: The actual code differs from what you see

Text-based approaches break on real C++ codebases. The solution? Don't treat code as text. Treat it as structure.

This means parsing, real parsing, like a compiler does. Lexical analysis, syntax analysis, semantic analysis. Build an Abstract Syntax Tree (AST). Understand what each symbol means, not just what it looks like.

The architecture follows from this ideology:

  1. Extract changes from Git (lines)

  2. Parse code into AST to understand structure (methods & functions)

  3. Map functions to tests via coverage database


Why a Compiler Front End?

We use LLVM Clang for C++ and Python's ast module for Python. Not because we're building a compiler, but because we need what compilers already know how to do.

The Compiler-as-Frontend Approach

A compiler does much more than generate machine code:

For test selection, we only need the first three stages:

  • Lexical Analysis: Turn source into tokens

  • Syntax Analysis: Build the AST

  • Semantic Analysis: Resolve symbols and types

We stop before IR generation. No optimization, no code generation. Just understanding.

This is the compiler-as-frontend pattern: leverage industrial-strength parsing without the cost of a full build.

What This Gives Us

Correct parsing of real C++ templates, macros, overloads, namespaces. Every edge case the language has, AST parsing handles.

Unique function identity in C++ via mangled names:

namespace MyLib {
    class Parser {
        void process(int x); // Mangled: _ZN5MyLib6Parser7processEi - Qualified: MyLib::Parser::process(int)
        void process(double x); // Mangled: _ZN5MyLib6Parser7processEd - Qualified: MyLib::Parser::process(double)
        void process(string& x); // Mangled: _ZN5MyLib6Parser7processERNSt - Qualified: MyLib::Parser::process(std::string&)
    };
}

The mangled name or the full qualified name encodes the full signature. It's what the linker sees, what coverage tools record, and what we use to uniquely identify functions.

Precise source ranges: exact start and end lines for every function declaration.


The Visitor Pattern: Language-Agnostic Design

Both C++ (Clang) and Python (ast module) use the same pattern: a visitor that walks the syntax tree and extracts function metadata.

Clang LibTooling Architecture

The visitor does one thing: for every function-like node (functions, methods, constructors, destructors), extract:

  • Function name (and qualified/mangled name)

  • Start line and end line

  • Node type

The procedure is simple yet precise:

class FunctionVisitor : public RecursiveASTVisitor<FunctionVisitor> {
public:
    explicit FunctionVisitor(ASTContext* context) : context(context) {}

    // Visit methods for different function-like declarations
    // All delegate to visitFunctionLikeDecl for uniform processing
    bool VisitFunctionDecl(FunctionDecl* func) { 
        return visitFunctionLikeDecl(func); 
    }
    bool VisitCXXMethodDecl(CXXMethodDecl* method) { 
        return visitFunctionLikeDecl(method); 
    }
    bool VisitCXXConstructorDecl(CXXConstructorDecl* ctor) { 
        return visitFunctionLikeDecl(ctor); 
    }
    bool VisitCXXDestructorDecl(CXXDestructorDecl* dtor) { 
        return visitFunctionLikeDecl(dtor); 
    }
/* ----------------------------------------------------------------------  */
    bool visitFunctionLikeDecl(FunctionDecl* func) {
        SourceManager &SM = context->getSourceManager();
        SourceLocation loc = func->getLocation();

        // CRITICAL: Only process functions in the main file being analyzed
        if (SM.isInMainFile(loc)) {
            // Get mangled name to uniquely identify overloaded functions
            std::string mangledName = getMangledName(func);
            
            // Deduplication: skip if already processed (declaration vs definition)
            if (seenFunctions.insert(mangledName).second) {
                // Generate full qualified name by demangling source functions mangled names
                std::string funcName = llvm::demangle(mangledName);
                
                // Extract precise source range
                SourceRange range = func->getSourceRange();
                FullSourceLoc startLoc = context->getFullLoc(range.getBegin());
                FullSourceLoc endLoc = context->getFullLoc(range.getEnd());

                std::string fileName = SM.getFilename(loc).str();
                bool isHeader = isHeaderFile(fileName);
                std::string functionType = determineFunctionType(func, isHeader);

                // Output structured function metadata
                printFunctionInfo(funcName, mangledName, functionType, fileName,
                                  startLoc.getSpellingLineNumber(), 
                                  endLoc.getSpellingLineNumber());
            }
        }
        return true;  // Continue traversal
    }

Three key decisions:

  1. Main file only: Don't re-analyze headers for every translation unit

  2. Mangled/Qualified names: Distinguish overloads; match coverage data

  3. Deduplication: Declaration vs definition appear as separate nodes

Python ASTVisitor

Python follows a similar implementation too:

def _process_function(self, node: ast.FunctionDef | ast.AsyncFunctionDef) -> None:
    """
    Core processing logic for function-like nodes.
    Analogous to visitFunctionLikeDecl() in the C++ matcher.
    """
    # Build qualified name from scope stack (e.g., "MyClass.my_method")
    qualified_name = self._get_qualified_name(node.name)

    # Deduplication check (same as seenFunctions.insert() in C++)
    if qualified_name in self.seen_functions:
        return
    self.seen_functions.add(qualified_name)

    # Determine start line: include decorators if present
    if node.decorator_list:
        start_line = node.decorator_list[0].lineno
    else:
        start_line = node.lineno

    metadata = FunctionMetadata(
        name=node.name,
        qualified_name=qualified_name,
        node_type=self._determine_node_type(node),
        start_line=start_line,
        end_line=self._get_end_line(node),
        file_path=self.file_path
    )
    self.functions.append(metadata)

This produces identical output regardless of language. The rest of the pipeline: change intersection, coverage lookup, is completely language-agnostic.

Given the changed lines from Git, and the “Shared Output Format” from either PyAST or CLANG, we can identify changed functions accordingly:

def find_changed_functions(self, function_metadata: list, changes: list) -> list[str]:
    """Find functions affected by Git changes."""
    affected_functions = []

    for name, func_start, func_end in function_metadata:
        for change_start, change_count in changes:
            # Calculate change bounds
            change_start = int(change_start)
            change_end = change_start + int(change_count) - 1

            # Check for overlap (handles partial, inside, and enveloping changes)
            if change_start <= func_end and change_end >= func_start:
                affected_functions.append(name)
                break
    return affected_functions

From Functions to Tests: The Coverage Link

Once we know which functions changed, and their source range, we need to know which tests exercise them.

The coverage database is built offline - run each test with coverage instrumentation, record which functions it calls, store the mapping, and keeping high frequency of coverage updates:

At selection time, we query: "Which tests have coverage entries for any affected function?"

Function-level coverage is the sweet spot:

  • More precise than file-level (fewer false positives)

  • Cheaper to collect than line-level

  • Stable across minor refactorings


Results: Does It Work?

Tested on a production codebase:

  • 40,000+ tests, ~4 minutes average runtime

  • 10,000+ commits across multiple release cycles

  • C/C++ codebase with complex implementation and templates

Scalability

Analysis time scales sublinearly with change size:

Test Reduction

Small changes select few tests. Large changes asymptotically approach the full suite. This matches intuition: surgical patches are surgical; massive refactors eventually touch everything.

  • Average: 50%+ reduction in tests run

  • Best case: Up to 80% reduction

  • Safety: 94% of failures captured in selected set


The Bigger Picture

AST-based test selection is one application of a broader idea: use compiler infrastructure as a library.

Compilers already solve hard problems, parsing complex languages, resolving symbols, tracking source locations. These capabilities have traditionally been locked inside opaque toolchains. Modern compiler projects like LLVM expose them as libraries.

This changes what's possible:

  • Static analysis tools that understand code semantics

  • Refactoring tools that transform code safely

  • Test selection that knows what functions exist and where

The ideology is simple: don't reinvent parsing. Don't treat code as text when you need structure. Use the tools that already understand your language.


Summary

The problem: Selecting which tests to run after a code change.

The insight: Tests exercise functions, not lines. To select tests accurately, you need to know which functions changed, not just which lines.

The approach: Parse code into an AST using a real compiler front end (Clang for C++, ast module for Python). Extract function boundaries. Intersect with changed lines. Map affected functions to tests via coverage.

The ideology: Treat code as structure, not text. Leverage compiler infrastructure. Build language-agnostic pipelines with language-specific front ends.

The result: 50%+ test reduction on average, analysis in seconds to minutes, 94% safety in capturing failures.