Machine-Level Programming V - Advanced Topics
Class: CSCE-312
Notes:
Today
- Memory Layout
- Buffer Overflow
- Vulnerability
- Protection
- Unions
- Like the set operation? No!
- We wish they would have chosen a different keyword for this.
- Union is just like a struct, except everything starts at the beginning, they all occupy the same element, which means you can only refer to one of them, you need to specify what the type is. It is a way of saving space by being more specific.
Memory Layout
x86-64 Linux Memory Layout
- Stack
- Runtime stack (8MB limit)
- E. g., local variables, and parameters beyond the 6 parameters of a function
- High address (towards the top) and grows down
- There is probably an environment variable you can change to make that stack bigger
- Heap
- Dynamically allocated as needed
- When call
malloc(),calloc(),new() - Starts at a low address and grows up as we need more heap
- Runtime programming manages the heap (allocations/deallocations)
- If we run out of heap space, we can start a new heap somewhere else, the runtime system takes care about this.
- Data
- Statically allocated data
- E.g., global vars,
staticvars, string constants
- Text / Shared Libraries
- Executable machine instructions
- Machine language program
- Shared libraries are also executable machine instructions
- Read-only
- Executable machine instructions
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023125029.png)
Memory Allocation Example
char big_array[1L<<24]; /* 16 MB */
char huge_array[1L<<31]; /* 2 GB */
int global = 0;
int useless() { return 0; }
int main ()
{
void *p1, *p2, *p3, *p4;
int local = 0;
p1 = malloc(1L << 28); /* 256 MB */
p2 = malloc(1L << 8); /* 256 B */
p3 = malloc(1L << 32); /* 4 GB */
p4 = malloc(1L << 8); /* 256 B */
/* Some print statements ... */
}
Where does everything go?
- Data:
char big_array[1L<<24];-> 16 Million characters = 16 MBchar huge_array[1L<<31];-> 2 Billion characters = 2 GB- Note that 1L means 1 as a
long(a big number) - It actually does not have to allocate all of those 0s, we can just set a flag or a thing that represents that this arrays are full of 0s.
- Note that 1L means 1 as a
int global = 0;
- Text:
int usless() { return 0; }int main() {...}
- Stack:
int local = 0*p1,*p2,*p3,*p4
- Heap:
malloc()= "get me these many bytes on the heap".p1,p2,p3,p4- Might allocate virtual memory instead of right away using physical memory
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023131056.png)
x86-64 Example Addresses
_address range ~2^47_
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023133812.png)
- Down heap is of certain size, and our system coordinates to create another heap if we run out of it.
- Also enables shared libraries to be isolated from our own instructions
- Garbage collection:
- C and C++ rely on the programmer to do allocation and deallocation
- Java, python, etc, automatically manage these for you
- It is a safer way but may include many more instructions than necessary.
Buffer Overflow
Recall: Memory Referencing Bug Example
typedef struct {
int a[2];
double d;
} struct_t;
double fun(int i) {
volatile struct_t s;
s.d = 3.14;
s.a[i] = 1073741824; /* Possibly out of bounds */
return s.d;
}
- This makes
struct_tinto a kind of class. volatilemeans the elements ofscan be change asynchronously with respect to the code that you see.- "Keep this in memory (do not keep it in registers) because it might change"
ibetter be 0 or 1, otherwise we go out of bounds of the array.
fun(0) ➙ 3.14
fun(1) ➙ 3.14
fun(2) ➙ 3.1399998664856
fun(3) ➙ 2.00000061035156
fun(4) ➙ 3.14
fun(6) ➙ Segmentation fault
- Result is system specific
- Remember x86 is a little-endian system.
fun(4)is changing whatever is beyond the struct, beyond thedouble.- What is a segmentation fault?
- When you access memory that you are not allowed to access?
- 4 KB between pages, so there is plenty of room for you to access other stuff beyond your scope.
- Where are local variables stored? on the Stack.
- A stack frame has many things, local variables, calle-caller saved registers, etc. there is extra memory allocated to the stack.
- This is called a Stack-Smashing Attack.
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023135202.png)
Such problems are a BIG deal
- Generally called a “buffer overflow”
- when exceeding the memory size allocated for an array
- Usually array indexing that causes these overflows
- Why a big deal?
- Most common form
- Unchecked lengths on string inputs
- A program reads a string/file and reads it into an array that isn't big enough to contain that file.
- Think: Animal Crossing Buffer Overflow Vulnerability
- Particularly for bounded character arrays on the stack
- sometimes referred to as stack smashing
- Unchecked lengths on string inputs
String Library Code
Implementation of Unix function gets()
/* Get string from stdin */
char *gets(char *dest)
{
int c = getchar();
char *p = dest;
while (c != EOF && c != '\n') {
*p++ = c;
c = getchar();
}
*p = '\0';
return dest;
}
- No way to specify limit on number of characters to read
- You should not be using
gets(), it is just available for backward compatibility.- The reason: whatever the size of the array that is passed, it is too small, because the hacker can make a string longer than that and input that, writing over part of the stack!
Similar problems with other library functions
strcpy,strcat: Copy strings of arbitrary lengthscanf,fscanf,sscanf, when given%sconversion specification
Vulnerable Buffer Code
/* Echo Line */
void echo()
{
char buf[4]; /* Way too small! */
gets(buf);
puts(buf);
}
- btw, how big is big enough?
- Risk: attacker overwrites the return address with characters that specify a point in the program that you do not want to go.
- A hacker can construct a string that says: return to the part of the string, etc.
- There are things to protect against that, the obvious thing is to not allow input to execute from the stack.
void call_echo() {
echo();
}
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023135950.png)
- First it actually works, why? because of alignment concerns the compiler is able to allocate more than 4 bytes on the stack frame (there are a few bytes for caller/callee saved register that maybe not necessarily crash the program)
- But if we go one more character further there is a Segmentation Fault.
Buffer Overflow Disassembly
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023161809.png)
- Note: 0x18 is 24 in decimal.
sub $0x18, %rsoallocates 24 bytes on the stack- Points to
bufat this point.
rdiissn't callee saved, it is caller saved.- Make the parameter
rdiforputs
Buffer Overflow Stack
C code:
/* Echo Line */
void echo()
{
char buf[4]; /* Way too small! */
gets(buf);
puts(buf);
}
Assembly code:
echo:
subq $24, %rsp
movq %rsp, %rdi
call gets
. . .
/CSCE-312/Visual%20Aids/Pasted%20image%2020251023162041.png)
- 24 bytes in the stack
- First 4 bytes in the stack are the variable
bufthe rest are determined by the compiler.
Buffer Overflow Example
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028131153.png)
Buffer Overflow Example #1
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028131217.png)
- Overflowed buffer, but did not corrupt state
Buffer Overflow Example #2
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028131405.png)
- Overflowed and corrupted return pointer
Buffer Overflow Example #3
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028131430.png)
- Overflowed buffer, corrupted return pointer, but program seems to work!
- The return address is modified, instead of f6, it is 00.
- There is code that does something.
Buffer Overflow Example #3 Explained
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028131539.png)
-
“Returns” to unrelated code
-
Lots of things happen, without modifying critical state
-
Eventually executes retq back to main
-
You really don't want this kind of bug because you will not notice what is going wrong.
Code Injection Attacks
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028131726.png)
- Input string contains byte representation of executable code
- Overwrite return address A with address of buffer B
- When Q executes
ret, will jump to exploit code
Notes:
- P calls Q
- Exploit code is code that is carefully crafted by the malicious user and does something.
- Delete/copy files
- Spawn processes, etc.
- If this program is running on root, this can be very dangerous
- We have pushed all the way the last 8 bytes of this carefully constructed message right to the return address of B and B points exactly to the exploit code now.
Exploits Based on Buffer Overflows
- Buffer overflow bugs can allow remote machines to execute arbitrary code on victim machines
- Distressingly common in real progams
- Programmers keep making the same mistakes ☹
- Recent measures make these attacks much more difficult
- Examples across the decades
- Original “Internet worm” (1988)
- “IM wars” (1999)
- Twilight hack on Wii (2000s)
- … and many, many more
- You will learn some of the tricks in attacklab
- Hopefully to convince you to never leave such holes in your programs!!
Example: the original Internet worm (1988)
- Exploited a few vulnerabilities to spread
- Early versions of the finger server (fingerd) used
gets()to read the argument sent by the client:finger droh@cs.cmu.edu
- Worm attacked fingerd server by sending phony argument:
finger “exploit-code padding new-return-address”- exploit code: executed a root shell on the victim machine with a direct TCP connection to the attacker.
- Early versions of the finger server (fingerd) used
- Once on a machine, scanned for other machines to attack
- invaded ~6000 computers in hours (10% of the Internet ☺ )
- see June 1989 article in Comm. of the ACM
- the young author of the worm was prosecuted…
- and CERT was formed… still homed at CMU
- invaded ~6000 computers in hours (10% of the Internet ☺ )
Notes:
- The worm would just type finger and a carefully crafted email address looking thing that actually contain malicious code
- Smash Stack -> Inject code -> propagate itself
- Back then those were a lot of computers.
- He did put rate limits in his code to not reach the whole internet but he had some kind of bug that enabled the worm to have a massive extensity.
Example 2: IM War
-
July, 1999
- Microsoft launches MSN Messenger (instant messaging system).
- Messenger clients can access popular AOL (America Online) Instant Messaging Service (AIM) servers
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028132833.png)
- At some point Microsoft people were not able to access AOL servers
- Microsoft literally hacked into AOL
- Then AOL will fix it and the process repeats again.
-
August 1999
- Mysteriously, Messenger clients can no longer access AIM servers
- Microsoft and AOL begin the IM war:
- AOL changes server to disallow Messenger clients
- Microsoft makes changes to clients to defeat AOL changes
- At least 13 such skirmishes
- What was really happening?
- AOL had discovered a buffer overflow bug in their own AIM clients
- They exploited it to detect and block Microsoft: the exploit code returned a 4-byte signature (the bytes at some location in the AIM client) to server
- When Microsoft changed code to match signature, AOL changed signature location
Aside: Worms and Viruses
-
Worm: A program that
- Can run by itself
- Can propagate a fully working version of itself to other computers
-
Virus: Code that
- Adds itself to other programs
- Does not run independently
-
Both are (usually) designed to spread among computers and to wreak havoc
Ok, what to do about buffer overflow attacks
- Avoid overflow vulnerabilities
- You code defensively
- Do not use vulnerable things like
gets() - You want array bounds checking!
- Employ system-level protections
- There are ways to manipulate user input such that it is safe
- Think Input Validation.
- There are ways to manipulate user input such that it is safe
- Have compiler use “stack canaries”
- Lets talk about each…
1. Avoid Overflow Vulnerabilities in Code (!)
/* Echo Line */
void echo()
{
char buf[4]; /* Way too small! */
fgets(buf, 4, stdin);
puts(buf);
}
- For example, use library routines that limit string lengths
- fgets instead of gets
- strncpy instead of strcpy
- Don’t use scanf with %s conversion specification
- Use fgets to read the string
- Or use %ns where n is a suitable integer
2. System-Level Protections can help (Randomized offsets)
- Randomized stack offsets
- At start of program, allocate random amount of space on stack
- Shifts stack addresses for entire program
- Makes it difficult for hacker to predict beginning of inserted code
- E.g.: 5 executions of memory allocation code
- Stack repositioned each time program executes
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028133618.png)
- Notes:
- Now the hacker does not know exactly what the return address may be
- The hacker cannot say "go to this return address" because that would be the wrong place
- This is not solving the problem, but this definitely makes it harder for the hacker to measure return addresses.
- There are still ways to eventually reach a malicious code by maybe brute forcing different offsets.
- Not really a great solution, it just slows down the hacker.
2. System-Level Protections can help (Nonexecutable code segments)
- Nonexecutable code segments
- In traditional x86, can mark region of memory as either “read-only” or “writeable”
- Can execute anything readable
- X86-64 added explicit “execute” permission
- Stack marked as non-executable
- In traditional x86, can mark region of memory as either “read-only” or “writeable”
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028133929.png)
- No matter what the malicious user puts into the user buffer overflow, it will not be executed
- It will just probably give a Segmentation Fault
- This completely solves the problem, at first
- Hackers actually got they way around it too.
3. Stack Canaries can help
-
Idea
- Place special value (“canary”) on stack just beyond buffer
- Check for corruption before exiting function
- Check the special value again to see if it has been "tempered with"
-
GCC Implementation
-fstack-protector- Now the default (disabled earlier)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028134224.png)
- Costs a tiny little bit of extra performance but it has turned a good security measure for stack smashing attack
- (Unless the hacker is very very smart)
Protected Buffer Disassembly (skip)
echo:
40072f: sub $0x18,%rsp
400733: mov %fs:0x28,%rax
40073c: mov %rax,0x8(%rsp)
400741: xor %eax,%eax
400743: mov %rsp,%rdi
400746: callq 4006e0 <gets>
40074b: mov %rsp,%rdi
40074e: callq 400570 <puts@plt>
400753: mov 0x8(%rsp),%rax
400758: xor %fs:0x28,%rax
400761: je 400768 <echo+0x39>
400763: callq 400580 <__stack_chk_fail@plt>
400768: add $0x18,%rsp
40076c: retq
Setting Up canary
C code:
/* Echo Line */
void echo()
{
char buf[4]; /* Way too small! */
gets(buf);
puts(buf);
}
Assembly code:
echo:
. . .
movq %fs:40, %rax # Get canary
movq %rax, 8(%rsp) # Place on stack
xorl %eax, %eax # Erase canary
. . .
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028134456.png)
Checking Canary
Assembly code:
echo:
. . .
movq 8(%rsp), %rax # Retrieve from stack
xorq %fs:40, %rax # Compare to canary
je .L6 # If same, OK
call __stack_chk_fail # FAIL
.L6: . . .
Input: 0123456
/CSCE-312/Visual%20Aids/Pasted%20image%2020251103201253.png)
Return-Oriented Programming Attacks
- Challenge (for hackers)
- Stack randomization makes it hard to predict buffer location
- Marking stack nonexecutable makes it hard to insert binary code
- Alternative Strategy
- Use existing code
- E.g., library code from stdlib
- String together fragments to achieve overall desired outcome
- Does not overcome stack canaries
- Use existing code
- Construct program from gadgets
- Sequence of instructions ending in
ret- Encoded by single byte
0xc3
- Encoded by single byte
- Code positions fixed from run to run
- Code is executable
- Sequence of instructions ending in
Notes:
- Challenge for attackers
- Stack randomization (ASLR) — the starting address of the stack (and other regions) is randomized each run, so an attacker can’t reliably point the saved return address at attacker-supplied data on the stack.
- Non-executable stack (NX / DEP) — even if an attacker can place machine code on the stack, the CPU will refuse to execute it because the page containing the stack is marked non-executable.
- Those two together stop the classic “inject code into buffer and jump to it” attack. ROP is the attacker’s workaround.
- Alternative strategy: reuse existing code
- Instead of injecting new code, reuse code already present in the program or libraries (for example, libc) — code that is already mapped and executable.
- The attacker finds short instruction sequences in that existing code that do useful tiny operations, then arranges to execute them in a particular order to perform a larger task.
- Gadgets and their properties
- A gadget is a short sequence of existing instructions that ends in a control-transfer instruction (commonly
ret).- Example:
pop rdi; ret— useful for loading a register from the stack.
- Example:
retis encoded as one byte (0xC3) on x86, which means many instruction sequences will naturally end at aretinstruction somewhere in existing code; attackers search for these.- When the function
rets, it pops the first address and jumps to gadget #1; gadget #1 runs some instructions and ends withret, which pops the next address and continues to gadget #2 — and so on. - Using sequences like
pop rdi; ret,pop rsi; ret,pop rdx; ret, then a gadget that contains acall systemor asyscall, the attacker can set up registers/stack and invoke privileged operations — all by chaining gadgets.
- When the function
- Because the program/library code locations are fixed in memory (unless randomized), gadgets have stable offsets — but ASLR can randomize base addresses (see mitigations below).
- Gadgets execute in the program’s executable memory, so NX/DEP does not prevent them.
- A gadget is a short sequence of existing instructions that ends in a control-transfer instruction (commonly
Why ROP defeats NX but not always other defenses
- Bypasses NX/DEP: ROP runs existing executable code pages, so it doesn’t need to make the stack executable.
- ASLR: If ASLR randomizes the address space (including libraries and binaries), gadget addresses are no longer predictable. ROP requires knowing gadget addresses. Attackers can:
- Use info leaks to learn addresses at runtime (then compute gadget addresses).
- Target binaries/libraries that are not randomized (e.g., non-PIE executables or fixed libraries).
- Stack canaries: A typical stack canary sits between local buffers and the saved return address. If the overflow that overwrites the return address also corrupts the canary, the function detects this and aborts. That prevents a naive overwrite of the saved return address. To bypass canaries an attacker must either leak the canary and restore it or find an overflow that avoids overwriting the canary — both are nontrivial. So we can say that ROP does not overcome stack canaries (unless the attacker can also defeat/learn the canary).
Gadget Example #1
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028134635.png)
- Use tail end of existing functions
Gadget Example #2
/CSCE-312/Visual%20Aids/Pasted%20image%2020251103203338.png)
- Repurpose byte codes
ROP Execution
/CSCE-312/Visual%20Aids/Pasted%20image%2020251103203614.png)
- Trigger with ret instruction
- Will start executing Gadget 1
- Final
retin each gadget will start next one
Unions
"Like structs, except that all the elements occupy the same space"
Union Allocation
- Allocate according to largest element
- Can only use one field at a time
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028135101.png)
- Useful when you want to have data that sometimes you want to interpret in different types or look differently depending on some condition.
- Also you do not want to have extra storage for the things you are not going to store.
Using Union to Access Bit Patterns
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028135350.png)
- Takes unsigned u and returns it as a float
- Is this the same as just casting u as a float?
- No, this is not the floating representation, it is something else
- It is reinterpreting bits to be of different type with the same bit pattern.
Byte Ordering Revisited
- Idea
- Short/long/quad words stored in memory as 2/4/8 consecutive bytes
- Which byte is most (least) significant?
- Can cause problems when exchanging binary data between machines
- Big Endian
- Most significant byte has lowest address
- Sparc
- Little Endian
- Least significant byte has lowest address
- Intel x86, ARM Android and IOS
- Bi Endian
- Can be configured either way
- ARM
Byte Ordering Example
union {
unsigned char c[8];
unsigned short s[4];
unsigned int i[2];
unsigned long l[1];
} dw;
/CSCE-312/Visual%20Aids/Pasted%20image%2020251028135625.png)
Print function:
int j;
for (j = 0; j < 8; j++)
dw.c[j] = 0xf0 + j;
printf("Characters 0-7 == [0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n",
dw.c[0], dw.c[1], dw.c[2], dw.c[3],
dw.c[4], dw.c[5], dw.c[6], dw.c[7]);
printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n",
dw.s[0], dw.s[1], dw.s[2], dw.s[3]);
printf("Ints 0-1 == [0x%x,0x%x]\n",
dw.i[0], dw.i[1]);
printf("Long 0 == [0x%lx]\n",
dw.l[0]);
Byte Ordering on IA32 (Little Endian)
Little Endian
/CSCE-312/Visual%20Aids/Pasted%20image%2020251103210713.png)
Output:
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]
Long 0 == [0xf3f2f1f0]
Byte Ordering on Sun (Big Endian)
Big Endian
/CSCE-312/Visual%20Aids/Pasted%20image%2020251103210845.png)
Output:
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts 0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]
Ints 0-1 == [0xf0f1f2f3,0xf4f5f6f7]
Long 0 == [0xf0f1f2f3]
Byte Ordering on x86-64 (Little Endian 64-bit)
Little Endian
/CSCE-312/Visual%20Aids/Pasted%20image%2020251103210944.png)
Output:
Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts 0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
Ints 0-1 == [0xf3f2f1f0,0xf7f6f5f4]
Long 0 == [0xf7f6f5f4f3f2f1f0]
Summary of Compound Types in C
-
Arrays
- Contiguous allocation of memory
- Aligned to satisfy every element’s alignment requirement
- Pointer to first element
- No bounds checking
-
Structures
- Allocate bytes in order declared
- Pad in middle and at end to satisfy alignment
-
Unions
- Overlay declarations
- Way to circumvent type system