Welcome to my blog!
where I will dive into the fascinating world of low-level software analysis with Practical
Binary Analysis by Dennis Andriesse.
As I work through this essential guide, I'll be summarizing key concepts, sharing insights, and tackling hands-on
exercises to break down the often complex and technical aspects of binary analysis.
Whether you're a seasoned reverse engineer or a curious newcomer, join me on this journey as we explore the
tools and techniques used to dissect software at the binary level and uncover the hidden mechanics behind computer
programs.
gcc -E -P
Stop compilation after preprocessing, where-E tells gcc to stop after preprocessing and-P causes the compiler to omit debugging information.
gcc -S -masm=intel
The -S flag tell gcc to stop after compilation stage and store the assembly files to disk(.s is a conventional extension for assembly files). -masm=intel emits assembly in Intel syntax rather than the default AT&T syntax.
Preprocessing is the first phase in the compilation process where all the macros and directives are expanded.
This is done to make the source code cleaner and ready for compilation. During this phase, any #include
directives, which bring in external header files, and #define macros, which define constants or shortcuts,
are fully expanded.
The next step in compiling a C program is the compilation phase, where the preprocessed C code is
translated into assembly language.
Compilers translate the code into assembly language, which is close to machine code but still human-readable.
Then, a dedicated assembler does the final conversion into machine code for the processor. This modularity makes
life easier for compiler developers.
The compilation phase isn't just about translation; it’s also where heavy optimization takes place. Tools
like GCC allow you to control this with flags like -O0 (no optimization) to -O3 (maximum
optimization). Optimizations can affect the final assembly output, making your program faster and more efficient,
though at the cost of readability when you inspect the disassembled code.
Another neat aspect of the assembly code generated during this phase is that it still preserves symbolic
information.
However, later on, when you deal with stripped binaries—where symbols are removed—you won’t have this luxury. In
such cases, disassembling the code becomes significantly harder!
Write a C program that contains several functions and compile it into an assembly file, an object file, and an executable binary, respectively. Try to locate the functions you wrote in the assembly file and in the disassembled object file and executable. Can you see the correspondence between the C code and the assembly code? Finally, strip the executable and try to identify the functions again.
// This is a sample JavaScript code
function greet(name) {
return `Hello, ${name}!`;
}
console.log(greet('Xavier'));
As you've seen, ELF binaries (and other types of binaries) are divided into sections. Some sections contain code, and others contain data. Why do you think the distinction between code and data sections exists? How do you think the loading process differs for code and data sections? Is it necessary to copy all sections into memory when a binary is loaded for execution?