Introduction
In the first part of my blog series on compilers and LLVM, I provided a brief introduction to compiler fundamentals and LLVM. We also wrote a simple LLVM analysis pass to print function names. In this post, we will explore more concepts and create additional analysis passes to perform specific tasks.
Module
In LLVM, a Module
is a top-level container that encapsulates all the information related to an entire program or a significant portion of a program. It is the top-level container for LLVM Intermediate Representation (IR) objects. Each Module
contains a list of global variables, a list of functions, a list of libraries (or other modules) this module depends on, a symbol table, and metadata about the target’s characteristics.
Basic Block
A basic block is a straight-line sequence of instructions with no branches, meaning that execution starts at a single entry point and proceeds sequentially to a single exit point, where it then continues to the next basic block. Basic blocks belong to functions and cannot have jumps into their middle, ensuring that once execution starts, it will proceed through all instructions in the block. The first instruction of a basic block is known as the leader.
Control Flow Graph (CFG)
A CFG is a directed graph whose nodes represent basic blocks. The edges between the nodes represent control flow paths, indicating how execution can proceed from one basic block to another.
Writing Passes
Note: In the previous blog post, we used registerPipelineStartEPCallback
which registers a callback for a default optimizer pipeline extension point, and thus requires optimization levels -O1 / -O2 / -O3
in order to run. Now, we’re gonna use registerPipelineParsingCallback
which will be helpful later on. Please note that I’ll not be writing the entire program again and again. Instead I’ll only show the actual implementation of the pass. Complete programs can be found here.
1 | PassPluginLibraryInfo getPassPluginInfo() |
To run it, we can generate a .ll
file containing the LLVM IR and then use opt
to execute it.
1 | clang-16 -S -emit-llvm test.c -o test.ll |
Also, we need to add the isRequired
function; otherwise, the run()
function will not get called.
1 | struct SomePass: public PassInfoMixin<SomePass>{ |
Writing a pass for printing global variables and their type
Here’s a simple LLVM pass that prints out all the global variables in a program alongwith their types. The code loops through all the globals, grabs their names and types, and prints them out.
1 | PreservedAnalyses run(Module &M, ModuleAnalysisManager &MPM) |
Writing a pass for detecting unused global variables
This code iterates through all the globals and calls the use_empty
function. This function returns true if there are no users of the value.
1 | PreservedAnalyses run(Module &M, ModuleAnalysisManager &MPM) |
Writing a pass for counting and printing all the basic blocks within functions
We go through every function in the module, then look at each basic block inside those functions. It’s crucial to check if a function is just a declaration. This is because modules often includes declarations of library functions that are used in the code, but their full implementations aren’t part of this module. Checking for declarations helps us avoid trying to analyze functions that don’t have any actual code in this module.
1 | PreservedAnalyses run(Module &M, ModuleAnalysisManager &MPM) |
Detecting recursion
To determine if a function is recursive, we need to iterate through all the instructions in its basic blocks and check for call instructions. When an instruction with a call opcode (Instruction::Call in LLVM)
is found, we can extract the called function by first casting the instruction to CallInst
using dyn_cast
(defined in llvm/Support/Casting.h) and then invoking getCalledFunction
.
1 | PreservedAnalyses run(Module &M, ModuleAnalysisManager &MPM) |
Depth-First Search on the Control Flow Graph
For each function, we can grab the first basic block, also known as the entry block, using F.getEntryBlock()
. Then we call the Dfs
function mentioned below.
1 | void Dfs(BasicBlock *currentBlock) |
1 | > opt-16 -load-pass-plugin ./lib.so -passes=run-pass -disable-output test.ll |
There are four basic blocks in the main
function. Let’s generate a visual representation of the control flow graph.
1 | # Generate the LLVM IR |
This is how the control flow graph for the main
function looks like, which matches with the graph that we printed using Depth-First Search.
That’s all for this post. In the next one, we’ll explore Transform Passes
and various compiler optimization techniques. Source code for all the passes, with Dockerfile can be found here