rss feed
Qries

Hacking function main


I found this post in which a guy, called James Rowe, successfully compiled a program in which he defined a variable called main and made it work as if there was an actually int main(){...}. So, before I finished to read the whole post I decided to try it by myself.

Last month I solved a buffer bomb similar to this one described here. Basically I used a buffer overflow to write directly on specific memory locations. The buffer overflow and my knowledge in linking, assembly and virtual memory gave me the confidence to try the something similar to James’s post.

The problem

I want to print on my screen holi boli by using a code in which I only declare and define a variable called main. Note this code only works on a x64 Linux machine

You can find the programs I coded here.

The result

I ended up with this code:

const char main[] ={
  0x55,                   	
  0x48, 0x89, 0xe5,
  0xb8, 0x01, 0x00, 0x00 , 0x00,
  0xbb, 0x01, 0x00, 0x00 , 0x00,
  0xbe, 0x81, 0x05, 0x40 , 0x00,    	
  0xba, 0x0a, 0x00, 0x00 , 0x00,	
  0x0f, 0x05,
  0xb8, 0x3c, 0x00, 0x00 , 0x00,
  0x0f, 0x05
  // <message>
  0x68, 0x6f, 0x6c, 0x69, 0x20,
  0x62,
  0x6f,             
  0x6c,              
  0x69, 0x0a, 0xb8, 0x00, 0x00, 0x00,
  0x00, 0x5d, 0xc3,
  0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 	
  0x00 
};

Of course the output is holi boli.

The process

First attempt

The first thing I did was to code what I wanted to achieve.

// baseline.c
#include <stdio.h>

int main(){
	printf("holi boli \n");
}

This code is straightforward. I compiled gcc -W -Wall test1.c -o run and I run objdump -D run to see what was the assembly code related to main.

00000000004004f6 <main>:
  4004f6: 55                push   %rbp
  4004f7: 48 89 e5          mov    %rsp,%rbp
  4004fa: bf 94 05 40 00    mov    $0x400594,%edi
  4004ff: e8 ec fe ff ff    callq  4003f0 <puts@plt>
  400504: b8 00 00 00 00    mov    $0x0,%eax
  400509: 5d                pop    %rbp
  40050a: c3                retq   
  40050b: 0f 1f 44 00 00    nopl   0x0(%rax,%rax,1)

From now on I will only show the memory locations and opcode not the assembly instructions given by objdump -D, unless it is necessary.

Based on the concept behind buffer overflow, I can use char arrays to write directly into memory. Thus, I naively did the following:

//test1.c
char main[] = { 0x55,
	0x48, 0x89, 0xe5,
	0xbf, 0x94, 0x05, 0x40, 0x00,
	0xe8, 0xec, 0xfe, 0xff, 0xff,
	0xb8, 0x00, 0x00, 0x00, 0x00,
	0x5d,
	0xc3,
	0x0f, 0x1f, 0x44, 0x00, 0x00,
};

I compiled and this happened Illegal instruction (core dumped), a segmentation fault. It was kind of expected something similar so I used objdump -D to see what was under the hood.

Disassembly of section .data:

0000000000601020 <__data_start>:
	...

0000000000601028 <__dso_handle>:
	...

0000000000601030 <main>:
  601030: 55            
  601031: 48 89 e5      
  601034: bf 94 05 40 00
  601039: e8 ec fe ff ff  callq  4003f0 <puts@plt>
  60103e: b8 00 00 00 00
  601043: 5d            
  601044: c3            
  601045: 0f 1f 44 00 00

The key word here is .data. Yes my friend, main in the .data section. For more information about ELF sections I wrote a post about ELF files. But if we wanted char main to work as if it were “true” code, main should be in .text section or .rodata section.

Second attempt

I added const before char main to move from .data to .rodata.

//test2.c
const char main[] = { 0x55,
	0x48, 0x89, 0xe5,
	0xbf, 0x94, 0x05, 0x40, 0x00,
	0xe8, 0xec, 0xfe, 0xff, 0xff,
	0xb8, 0x00, 0x00, 0x00, 0x00,
	0x5d,
	0xc3,
	0x0f, 0x1f, 0x44, 0x00, 0x00,
};

I run the code and I got Illegal instruction (core dumped). Another infamous segmentation fault. And of course, as you was expected, I used objdump -D:

Disassembly of section .rodata:

0000000000400530 <_IO_stdin_used>:
  400530:	01 00             
  400532:	02 00             
	...

0000000000400540 <main>:
  400540:	55                
  400541:	48 89 e5          
  400544:	bf 94 05 40 00    
  400549:	e8 ec fe ff ff   
  40054e:	b8 00 00 00 00    
  400553:	5d                
  400554:	c3                
  400555:	0f 1f 44 00 00  

Nothing seems to be wrong. So I used baseline.c to see what could happen, and I noticed this 4004ff: e8 ec fe ff ff callq 4003f0 <puts@plt>. Let’s check in which address is the GOT .plt for printf (puts@plt)… Oohhh shit!, obviously the linker didn’t link printf since there is no reference to that function. Let’s try another approach :)

The GOT or global offset table is used to link functions defined in shared libraries in such a way that instead of loading the whole library, the compiler only uses the function that have references in the code. To learn more you can google it. Maybe I will write a post about it, when it happens I will update this post.

The GOT .plt for printf in baseline.c is as follow:

00000000004003f0 <puts@plt>:
  4003f0: ff 25 22 0c 20 00    jmpq   *0x200c22(%rip)     
                    ; 601018 <_GLOBAL_OFFSET_TABLE_+0x18>
  4003f6: 68 00 00 00 00       pushq  $0x0
  4003fb: e9 e0 ff ff ff       jmpq   4003e0 <_init+0x18>

Third attempt

I got stuck here, so I supposed that James had the same problem. I read a little more his post and he used a syscall to write on screen. I read this post, suggested by James, and followed the instructions.

System call used assembly instructions, I wrote an assembly program to see whether a syscall would help me. I ended up with the code test3.c. I added comments to make the code easy to understand.

//test3.c
int main(){
    __asm__ (
        /* 1 is the syscall number to
        write on 64bit */
    "movl $1, %eax;\n"  
        /* 1 is stdout and is the 
        first argument */
    "movl $1, %ebx;\n"  
        /* load the address of string 
        into the second argument*/
    "movl $message, %esi;\n" 
        /* third argument is the length 
        of the string to print*/
    "movl $10, %edx;\n"  
    "syscall;\n"
    "movl $60,%eax;\n" // call exit
    "syscall;\n"
        /* Store the "holi boli\n" 
        inside the main function */
    "message: .ascii \"holi boli\\n\";"
    );
    return 0;
}

I compiled the code and guess what? it worked!!! It printed holi boli on the screen.

Fourth attempt:

I used objdump -D to get the opcode:

00000000004004a6 <main>:
  4004a6:    55                push   %rbp
  4004a7:    48 89 e5          mov    %rsp,%rbp
  4004aa:    b8 01 00 00 00    mov    $0x1,%eax
  4004af:    bb 01 00 00 00    mov    $0x1,%ebx
  4004b4:    be c7 04 40 00    mov    $0x4004c7,%esi
  4004b9:    ba 0a 00 00 00    mov    $0xa,%edx
  4004be:    0f 05             syscall 
  4004c0:    b8 3c 00 00 00    mov    $0x3c,%eax
  4004c5:    0f 05             syscall 

If you don’t notice, the line "movl $message, %esi;\n" in test3.c has been replaced by mov $0x4004c7,%esi. This means that the message holi boli is somewhere else, in this case the message starts at 0x4004c7.

00000000004004c7 <message>:
  4004c7:    68 6f 6c 69 20      
  4004cc:    62                  
  4004cd:    6f                  
  4004ce:    6c                  
  4004cf:    69 0a b8 00 00 00  
  4004d5:    00 5d c3            
  4004d8:    0f 1f 84 00 00 00 00
  4004df:    00 

I naively joined both parts as follows to see the new memory address where the message will be allocate.

//test4.c
const char main[] ={
  0x55,                   	
  0x48, 0x89, 0xe5,
  0xb8, 0x01, 0x00, 0x00, 0x00,
  0xbb, 0x01, 0x00, 0x00, 0x00,
  0xbe, 0xc7, 0x04, 0x40, 0x00,    	
  0xba, 0x0a, 0x00, 0x00, 0x00,	
  0x0f, 0x05,
  0xb8, 0x3c, 0x00, 0x00, 0x00,
  0x0f, 0x05
  // <message>
  0x68, 0x6f, 0x6c, 0x69, 0x20,
  0x62,
  0x6f,               
  0x6c,                
  0x69, 0x0a, 0xb8, 0x00, 0x00, 0x00,
  0x00, 0x5d, 0xc3,
  0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 	
  0x00
};

The output was SI��I��. This means two things. First, I am doing a syscall and second, as I expected, the program is reading the wrong section.

Success

With the help of objdump I get the correct memory address for the message.

0000000000400560 <main>:
  400560:	55                  
  400561:	48 89 e5            
  400564:	b8 01 00 00 00      
  400569:	bb 01 00 00 00      
  40056e:	be c7 04 40 00      
  400573:	ba 0a 00 00 00      
  400578:	0f 05               
  40057a:	b8 3c 00 00 00     
    ; syscall  
  40057f:	0f 05           
    ; message
  400581:	68 6f 6c 69 20      
  400586:	62                  
  400587:	6f                  
  400588:	6c                  
  400589:	69 0a b8 00 00 00   
  40058f:	00 5d c3            
  400592:	0f 1f 84 00 00 00 00
  400599:	00 

So the new address is 0x400581. I updated the code.

const char main[] ={
  0x55,                   	
  0x48, 0x89, 0xe5,
  0xb8 , 0x01 , 0x00 , 0x00 , 0x00,
  0xbb , 0x01 , 0x00 , 0x00 , 0x00,
  // update of the address
  0xbe , 0x81 , 0x05 , 0x40 , 0x00,    	
  0xba , 0x0a , 0x00 , 0x00 , 0x00,	
  0x0f , 0x05 ,
  0xb8 , 0x3c , 0x00 , 0x00 , 0x00,
  0x0f , 0x05,
  // <message>
  0x68, 0x6f , 0x6c , 0x69 , 0x20 ,
  0x62,
  0x6f,               
  0x6c,                
  0x69 , 0x0a , 0xb8 , 0x00 , 0x00 , 0x00,
  0x00 , 0x5d , 0xc3 ,
  0x0f , 0x1f , 0x84 , 0x00 , 0x00 , 0x00, 0x00, 	
  0x00 
};

I run the last code and it was a success! Woo-hoo!

Final thoughts

  • It was tricky, but the basics will never betray you.
  • I could use gdb to perform all the disassembles, but the code wasn’t very long so objdump and vim made the magic for me.

Comments powered by Talkyard.


Share it!
Similar Posts