Linking basics
February 11, 2017
In my journey to master the most obscure and secrets arts of computer systems, I started by mastering the basics.
One thing I have never worried to learn properly was linking, since most of the projects I worked on had everything set, Makefile was already written.
But the time to grow up becomes.
Let’s start then.a
I took the examples from
Computer Systems: A Programmer’s Perspective
ELF object file
The ELF file contains a header that give us information that will help the linker to parse and interpret the object file.
Here are some information of an ELF header, I used the readelf command.
ELF Header:
Class: ELF64
Data: 2's complement, little endian
Version 1 (current)
OS/ABI: UNIX - System V
Type: EXEC (Executable file)
Machine Advanced Micro Devices X86-64
Entry point address: 0x4003b0
Start of program headers: 64 (bytes into file)
Start of section headers: 6568 (bytes into file)
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers:9
Size of section headers: 64 (bytes)
Number of section headers:28
The basic structure of an ELF file is as follow:
Image obtained from this site
.initis the initialization code before themainin C such as set to zero global variables or defined the interrupt vector table..textis the machine code of the compiled program.rodataread-only data, such asconst char.dataglobal variables that have been initialized.bssuninitialized global variables.symtabtable symbol with information about functions and global variables that are defined and referenced in the program.debugdebugging symbol table, only generated when compiling with-g.linemapping between actual C code and compiled code, only generated when compiling with-g.strtaba string table for symbol in.symtaband.debug
Example
Let’s analyze the following example
// elf.c
#include<stdio.h>
int A = 5;
const char a[] = "hallo danke";
int main()
{
printf("hello world\n");
printf("%i\n",A);
printf("%s\n",a);
return 0;
}
I compiled it with gcc -c since I just wanted to focus on the code itself.
Let’s check the .rodata and see what it contains, I used objdump -D to get the assembly code of that section.
Disassembly of section .rodata:
0000000000000000 <a>:
0: 68 61 6c 6c 6f h a l l o
5: 20 64 61 6e d a n
9: 6b 65 00 68 k e 0 h
d: 65 6c e l
f: 6c l
10: 6f o
11: 20 77 6f w o
14: 72 6c r l
16: 64 d
17: 00 0
18: 25 %
19: 69 i
1a: 0a 00 ⤶ 0
In case it is no obvious, I added the letters using the ASCII code. As I mentioned before .rodata only contains read only data.
Now let’s focus on the .symtab.
To do so, let’s compile the code without the -c option.
I used objdump -x to visualize the .symtab. Some entries of the .symtab are shown below.
400546 g F .text 0000036 main
400608 g O .rodata 000000c a
000000 F *UND* 0000000 printf@@GLIBC_2.2.5
Description of the columns from left to right.
- virtual memory offset
- unit size, in the case of
mainandais g, it stands for giant words (8 bytes) - type,
FandOstand for function and object respectively - section where it belongs, for example
mainbelongs to.text - size in hex
- name, in case of
printf, it is also mentioned the shared library where it is defined.
So these are the basics of the ELF files. If want to read more just google it or read the man pages of the commands I used, I am not kidding. You can also read the chapter 7 of book it mentioned above.
Symbol Resolution
The compiler associates each symbol reference with one symbol definition in the ELF file. The compiler allows one only ones definition of each local symbol per module. It also ensures that static local variables have unique names. In case of global symbols such as variables or functions the compiler has to decide which reference to use, and thus symbol resolution maybe be tricky.
Example 1
In symbol1.c, the symbol x has been defined as int and it has a reference inside the printf.
// symbol1.c
#include <stdio.h>
int x = 15213;
int y = 15212;
int main(){
printf("x = %i \t y = %i \n",
x,y);
return 0;
}
x = 15213 y = 15212
But what would happen if we defined the symbol x somewhere else:
// symbol2.c
double x;
void f(){
x = -0.0;
}
// symbol3.c
#include <stdio.h>
void f(void);
int x = 15213;
int y = 15212;
int main(){
f();
printf("x = 0x%x\t y = 0x%x \n",
x,y);
return 0;
}
I compiled it as follow gcc -o test symbol3.c symbol2.c, and this is the result.
x = 0x0 y = 0x80000000
What had happened?
In symbol3 both x and y have been declared as int, but in symbol2.c x has been declared as double. So in memory is something like this.
// symbol3.c
0000 X X X X // 4 bytes
0004 Y Y Y Y // 4 bytes
0008 ...
// symbol2.c
0000 X X X X X X X X // 8 bytes
0 0 0 0 0 0 0 8 // -0.0
0008 ...
Thus, the assignment of x as double overwrites the memory location of int x and int y. And that’s a nasty bug.
Example 2
Let’s defined a function p() and a char main in symbol4.c.
//symbol4.c
#include <stdio.h>
char main;
void p(){
printf("0x%x\n",*(&main+1));
printf("0x%x\n",main);
}
And call p() in another file symbol5.c
//symbol5.c
void p(void);
int main(){
p();
return 0;
}
The result of compiling as follow gcc -o test symbol5.c symbol4.c is:
// output
0x48
0x55
Why did p() print something even though main has not been initialized?
With the help of objdump -x test let’s see if main is defined in first place:
4004f6 g F .text 000010 main
It is defined as function, this is because char main is a weak symbol (it has been declared but not defined) whereas int main() {...} is a strong symbol. Thus, function main overrides char main.
Now it’s time to explain the output. I use objdump -D to see the opcode of main.
00000000004004f6 <main>:
4004f6: 55 push %rbp
4004f7: 48 89 e5 mov %rsp,%rbp
4004fa: e8 07 00 00 00 callq 400506 <p2>
4004ff: b8 00 00 00 00 mov $0x0,%eax
400504: 5d pop %rbp
400505: c3
It is easy to observed that printf("0x%x\n",*(&main+1)); prints what is in address 4004f7 (0x48), and printf("0x%x\n",main); prints what is in address 4004f6 (0x55).
Final thoughts
- Always compile your code with
-Wand-Wallflags to avoids these types of “unexpected” behaviors. objdumpis a very useful tool to understand what is going on under the hood, but it can be a pain in the ass if the code is very large.- Even though I didn’t use
gdbin this post, I highly recommend to learn it.