Machine-Level Programming V - Advanced Topics

#memory #programming #assembly

Class: CSCE-312

Notes:

Today

Memory Layout
Buffer Overflow
- Vulnerability
- Protection
Unions
- Like the set operation? No!
- We wish they would have chosen a different keyword for this.
- Union is just like a struct, except everything starts at the beginning, they all occupy the same element, which means you can only refer to one of them, you need to specify what the type is. It is a way of saving space by being more specific.

Memory Layout

x86-64 Linux Memory Layout

Stack
- Runtime stack (8MB limit)
- E. g., local variables, and parameters beyond the 6 parameters of a function
- High address (towards the top) and grows down
- There is probably an environment variable you can change to make that stack bigger
Heap
- Dynamically allocated as needed
- When call malloc(), calloc(), new()
- Starts at a low address and grows up as we need more heap
- Runtime programming manages the heap (allocations/deallocations)
- If we run out of heap space, we can start a new heap somewhere else, the runtime system takes care about this.
Data
- Statically allocated data
- E.g., global vars, static vars, string constants
Text / Shared Libraries
- Executable machine instructions
  - Machine language program
  - Shared libraries are also executable machine instructions
- Read-only

Pasted image 20251023125029.png|300

Memory Allocation Example

char big_array[1L<<24];  /* 16 MB */
char huge_array[1L<<31]; /*  2 GB */

int global = 0;

int useless() { return 0; }

int main ()
{
    void *p1, *p2, *p3, *p4;
    int local = 0;
    p1 = malloc(1L << 28); /* 256 MB */
    p2 = malloc(1L << 8);  /* 256  B */
    p3 = malloc(1L << 32); /*   4 GB */
    p4 = malloc(1L << 8);  /* 256  B */
 /* Some print statements ... */
}

Where does everything go?

Data:
- char big_array[1L<<24]; -> 16 Million characters = 16 MB
- char huge_array[1L<<31]; -> 2 Billion characters = 2 GB
  - Note that 1L means 1 as a long (a big number)
  - It actually does not have to allocate all of those 0s, we can just set a flag or a thing that represents that this arrays are full of 0s.
- int global = 0;
Text:
- int usless() { return 0; }
- int main() {...}
Stack:
- int local = 0
- *p1, *p2, *p3, *p4
Heap:
- malloc() = "get me these many bytes on the heap".
- p1, p2, p3, p4
- Might allocate virtual memory instead of right away using physical memory

Pasted image 20251023131056.png|75

x86-64 Example Addresses

_address range ~2^47_
Pasted image 20251023133812.png|500

Down heap is of certain size, and our system coordinates to create another heap if we run out of it.
- Also enables shared libraries to be isolated from our own instructions
Garbage collection:
- C and C++ rely on the programmer to do allocation and deallocation
- Java, python, etc, automatically manage these for you
  - It is a safer way but may include many more instructions than necessary.

Buffer Overflow

Recall: Memory Referencing Bug Example

typedef struct {
  int a[2];
  double d;
} struct_t;

double fun(int i) {
  volatile struct_t s;
  s.d = 3.14;
  s.a[i] = 1073741824; /* Possibly out of bounds */
  return s.d;
}

This makes struct_t into a kind of class.
volatile means the elements of s can be change asynchronously with respect to the code that you see.
- "Keep this in memory (do not keep it in registers) because it might change"
i better be 0 or 1, otherwise we go out of bounds of the array.

fun(0)  ➙	3.14
fun(1)  ➙	3.14
fun(2)  ➙	3.1399998664856
fun(3)  ➙	2.00000061035156
fun(4)  ➙	3.14
fun(6)  ➙	Segmentation fault

Result is system specific
- Remember x86 is a little-endian system.
fun(4) is changing whatever is beyond the struct, beyond the double.
What is a segmentation fault?
- When you access memory that you are not allowed to access?
- 4 KB between pages, so there is plenty of room for you to access other stuff beyond your scope.
- Where are local variables stored? on the Stack.
  - A stack frame has many things, local variables, calle-caller saved registers, etc. there is extra memory allocated to the stack.
This is called a Stack-Smashing Attack.

Pasted image 20251023135202.png|500

Such problems are a BIG deal

Generally called a “buffer overflow”
- when exceeding the memory size allocated for an array
- Usually array indexing that causes these overflows
Why a big deal?
- It’s the #1 technical cause of security vulnerabilities
  - #1 overall cause is social engineering / user ignorance
Most common form
- Unchecked lengths on string inputs
  - A program reads a string/file and reads it into an array that isn't big enough to contain that file.
  - Think: Animal Crossing Buffer Overflow Vulnerability
- Particularly for bounded character arrays on the stack
  - sometimes referred to as stack smashing

String Library Code

Implementation of Unix function gets()

/* Get string from stdin */
char *gets(char *dest) {     
	int c = getchar();
    char *p = dest;
    while (c != EOF && c != '\n') {
        *p++ = c;
        c = getchar();
    }
    *p = '\0';
    return dest;
}

No way to specify limit on number of characters to read
You should not be using gets(), it is just available for backward compatibility.
- The reason: whatever the size of the array that is passed, it is too small, because the hacker can make a string longer than that and input that, writing over part of the stack!

Similar problems with other library functions

strcpy, strcat: Copy strings of arbitrary length
scanf, fscanf, sscanf, when given %s conversion specification

Vulnerable Buffer Code

/* Echo Line */  
void echo()  
{  
    char buf[4];  /* Way too small! */  
    gets(buf);  
    puts(buf);  
}

btw, how big is big enough?
Risk: attacker overwrites the return address with characters that specify a point in the program that you do not want to go.
- A hacker can construct a string that says: return to the part of the string, etc.
There are things to protect against that, the obvious thing is to not allow input to execute from the stack.

void call_echo() {
    echo();
}

Pasted image 20251023135950.png|350

First it actually works, why? because of alignment concerns the compiler is able to allocate more than 4 bytes on the stack frame (there are a few bytes for caller/callee saved register that maybe not necessarily crash the program)
But if we go one more character further there is a Segmentation Fault.

Buffer Overflow Disassembly

Pasted image 20251023161809.png|550

Note: 0x18 is 24 in decimal.
- sub $0x18, %rso allocates 24 bytes on the stack
- Points to buf at this point.
rdi issn't callee saved, it is caller saved.
Make the parameter rdi for puts

Buffer Overflow Stack

C code:

/* Echo Line */  
void echo()  
{  
    char buf[4];  /* Way too small! */  
    gets(buf);  
    puts(buf);  
}

Assembly code:

echo:
  subq  $24, %rsp
  movq  %rsp, %rdi
  call  gets
  . . .

Pasted image 20251023162041.png|250

24 bytes in the stack
First 4 bytes in the stack are the variable buf the rest are determined by the compiler.

Buffer Overflow Example

Pasted image 20251028131153.png|500

Buffer Overflow Example #1

Pasted image 20251028131217.png|500

Overflowed buffer, but did not corrupt state

Buffer Overflow Example #2

Pasted image 20251028131405.png|500

Overflowed and corrupted return pointer

Buffer Overflow Example #3

Pasted image 20251028131430.png|500

Overflowed buffer, corrupted return pointer, but program seems to work!
The return address is modified, instead of f6, it is 00.
There is code that does something.

Buffer Overflow Example #3 Explained

Pasted image 20251028131539.png|450

“Returns” to unrelated code
Lots of things happen, without modifying critical state
Eventually executes retq back to main
You really don't want this kind of bug because you will not notice what is going wrong.

Code Injection Attacks

Pasted image 20251028131726.png|500

Input string contains byte representation of executable code
Overwrite return address A with address of buffer B
When Q executes ret, will jump to exploit code

Notes:

P calls Q
Exploit code is code that is carefully crafted by the malicious user and does something.
- Delete/copy files
- Spawn processes, etc.
- If this program is running on root, this can be very dangerous
We have pushed all the way the last 8 bytes of this carefully constructed message right to the return address of B and B points exactly to the exploit code now.

Exploits Based on Buffer Overflows

Buffer overflow bugs can allow remote machines to execute arbitrary code on victim machines
Distressingly common in real progams
- Programmers keep making the same mistakes ☹
- Recent measures make these attacks much more difficult
Examples across the decades
- Original “Internet worm” (1988)
- “IM wars” (1999)
- Twilight hack on Wii (2000s)
- … and many, many more
You will learn some of the tricks in attacklab
- Hopefully to convince you to never leave such holes in your programs!!

Example: the original Internet worm (1988)

Exploited a few vulnerabilities to spread
- Early versions of the finger server (fingerd) used gets() to read the argument sent by the client:
  - finger droh@cs.cmu.edu
- Worm attacked fingerd server by sending phony argument:
  - finger “exploit-code padding new-return-address”
  - exploit code: executed a root shell on the victim machine with a direct TCP connection to the attacker.
Once on a machine, scanned for other machines to attack
- invaded ~6000 computers in hours (10% of the Internet ☺ )
  - see June 1989 article in Comm. of the ACM
- the young author of the worm was prosecuted…
- and CERT was formed… still homed at CMU

Notes:

The worm would just type finger and a carefully crafted email address looking thing that actually contain malicious code
Smash Stack -> Inject code -> propagate itself
Back then those were a lot of computers.
He did put rate limits in his code to not reach the whole internet but he had some kind of bug that enabled the worm to have a massive extensity.

Example 2: IM War

July, 1999
- Microsoft launches MSN Messenger (instant messaging system).
- Messenger clients can access popular AOL (America Online) Instant Messaging Service (AIM) servers
- At some point Microsoft people were not able to access AOL servers
- Microsoft literally hacked into AOL
- Then AOL will fix it and the process repeats again.
August 1999
- Mysteriously, Messenger clients can no longer access AIM servers
- Microsoft and AOL begin the IM war:
  - AOL changes server to disallow Messenger clients
  - Microsoft makes changes to clients to defeat AOL changes
  - At least 13 such skirmishes
- What was really happening?
  - AOL had discovered a buffer overflow bug in their own AIM clients
  - They exploited it to detect and block Microsoft: the exploit code returned a 4-byte signature (the bytes at some location in the AIM client) to server
  - When Microsoft changed code to match signature, AOL changed signature location

Aside: Worms and Viruses

Worm: A program that
- Can run by itself
- Can propagate a fully working version of itself to other computers
Virus: Code that
- Adds itself to other programs
- Does not run independently
Both are (usually) designed to spread among computers and to wreak havoc

Ok, what to do about buffer overflow attacks

Avoid overflow vulnerabilities
- You code defensively
- Do not use vulnerable things like gets()
- You want array bounds checking!
Employ system-level protections
- There are ways to manipulate user input such that it is safe
  - Think Input Validation.
Have compiler use “stack canaries”
Lets talk about each…

1. Avoid Overflow Vulnerabilities in Code (!)

/* Echo Line */  
void echo()  
{  
    char buf[4];  /* Way too small! */  
    fgets(buf, 4, stdin);  
    puts(buf);  
}

For example, use library routines that limit string lengths
- fgets instead of gets
- strncpy instead of strcpy
- Don’t use scanf with %s conversion specification
  - Use fgets to read the string
  - Or use %ns where n is a suitable integer

2. System-Level Protections can help (Randomized offsets)

Randomized stack offsets
- At start of program, allocate random amount of space on stack
- Shifts stack addresses for entire program
- Makes it difficult for hacker to predict beginning of inserted code
- E.g.: 5 executions of memory allocation code
  - Stack repositioned each time program executes

Pasted image 20251028133618.png|150

Notes:
- Now the hacker does not know exactly what the return address may be
- The hacker cannot say "go to this return address" because that would be the wrong place
- This is not solving the problem, but this definitely makes it harder for the hacker to measure return addresses.
- There are still ways to eventually reach a malicious code by maybe brute forcing different offsets.
- Not really a great solution, it just slows down the hacker.

2. System-Level Protections can help (Nonexecutable code segments)

Nonexecutable code segments
- In traditional x86, can mark region of memory as either “read-only” or “writeable”
  - Can execute anything readable
- X86-64 added explicit “execute” permission
- Stack marked as non-executable

Pasted image 20251028133929.png|300

No matter what the malicious user puts into the user buffer overflow, it will not be executed
It will just probably give a Segmentation Fault
This completely solves the problem, at first
Hackers actually got they way around it too.

3. Stack Canaries can help

Idea
- Place special value (“canary”) on stack just beyond buffer
- Check for corruption before exiting function
  - Check the special value again to see if it has been "tempered with"
GCC Implementation
- -fstack-protector
- Now the default (disabled earlier)
- Costs a tiny little bit of extra performance but it has turned a good security measure for stack smashing attack
  - (Unless the hacker is very very smart)

Protected Buffer Disassembly (skip)

echo:

  40072f:	sub    $0x18,%rsp
  400733:	mov    %fs:0x28,%rax
  40073c:	mov    %rax,0x8(%rsp)
  400741:	xor    %eax,%eax
  400743:	mov    %rsp,%rdi
  400746:	callq  4006e0 <gets>
  40074b:	mov    %rsp,%rdi
  40074e:	callq  400570 <puts@plt>
  400753:	mov    0x8(%rsp),%rax
  400758:	xor    %fs:0x28,%rax
  400761:	je     400768 <echo+0x39>
  400763:	callq  400580 <__stack_chk_fail@plt>
  400768:	add    $0x18,%rsp
  40076c:	retq

Setting Up canary

C code:

/* Echo Line */  
void echo()  
{  
    char buf[4];  /* Way too small! */  
    gets(buf);  
    puts(buf);  
}

Assembly code:

echo:
	. . .
	movq	%fs:40, %rax  # Get canary
	movq	%rax, 8(%rsp) # Place on stack
	xorl	%eax, %eax    # Erase canary
	. . .

Pasted image 20251028134456.png|200

Checking Canary

Assembly code:

echo:
	. . .
	movq	8(%rsp), %rax     # Retrieve from stack
	xorq	%fs:40, %rax      # Compare to canary
	je	    .L6               # If same, OK
	call	__stack_chk_fail  # FAIL
.L6:        . . .

Input: 0123456

Pasted image 20251103201253.png|200

Return-Oriented Programming Attacks

Challenge (for hackers)
- Stack randomization makes it hard to predict buffer location
- Marking stack nonexecutable makes it hard to insert binary code
Alternative Strategy
- Use existing code
  - E.g., library code from stdlib
- String together fragments to achieve overall desired outcome
- Does not overcome stack canaries
Construct program from gadgets
- Sequence of instructions ending in ret
  - Encoded by single byte 0xc3
- Code positions fixed from run to run
- Code is executable

Notes:

Challenge for attackers
- Stack randomization (ASLR) — the starting address of the stack (and other regions) is randomized each run, so an attacker can’t reliably point the saved return address at attacker-supplied data on the stack.
- Non-executable stack (NX / DEP) — even if an attacker can place machine code on the stack, the CPU will refuse to execute it because the page containing the stack is marked non-executable.
- Those two together stop the classic “inject code into buffer and jump to it” attack. ROP is the attacker’s workaround.
Alternative strategy: reuse existing code
- Instead of injecting new code, reuse code already present in the program or libraries (for example, libc) — code that is already mapped and executable.
- The attacker finds short instruction sequences in that existing code that do useful tiny operations, then arranges to execute them in a particular order to perform a larger task.
Gadgets and their properties
- A gadget is a short sequence of existing instructions that ends in a control-transfer instruction (commonly ret).
  - Example: pop rdi; ret — useful for loading a register from the stack.
- ret is encoded as one byte (0xC3) on x86, which means many instruction sequences will naturally end at a ret instruction somewhere in existing code; attackers search for these.
  - When the function rets, it pops the first address and jumps to gadget #1; gadget #1 runs some instructions and ends with ret, which pops the next address and continues to gadget #2 — and so on.
  - Using sequences like pop rdi; ret, pop rsi; ret, pop rdx; ret, then a gadget that contains a call system or a syscall, the attacker can set up registers/stack and invoke privileged operations — all by chaining gadgets.
- Because the program/library code locations are fixed in memory (unless randomized), gadgets have stable offsets — but ASLR can randomize base addresses (see mitigations below).
- Gadgets execute in the program’s executable memory, so NX/DEP does not prevent them.

Why ROP defeats NX but not always other defenses

Bypasses NX/DEP: ROP runs existing executable code pages, so it doesn’t need to make the stack executable.
ASLR: If ASLR randomizes the address space (including libraries and binaries), gadget addresses are no longer predictable. ROP requires knowing gadget addresses. Attackers can:
- Use info leaks to learn addresses at runtime (then compute gadget addresses).
- Target binaries/libraries that are not randomized (e.g., non-PIE executables or fixed libraries).
Stack canaries: A typical stack canary sits between local buffers and the saved return address. If the overflow that overwrites the return address also corrupts the canary, the function detects this and aborts. That prevents a naive overwrite of the saved return address. To bypass canaries an attacker must either leak the canary and restore it or find an overflow that avoids overwriting the canary — both are nontrivial. So we can say that ROP does not overcome stack canaries (unless the attacker can also defeat/learn the canary).

Gadget Example #1

Pasted image 20251028134635.png|550

Use tail end of existing functions

Gadget Example #2

Pasted image 20251103203338.png|575

Repurpose byte codes

ROP Execution

Pasted image 20251103203614.png|500

Trigger with ret instruction
- Will start executing Gadget 1
Final ret in each gadget will start next one

Unions

"Like structs, except that all the elements occupy the same space"

Union Allocation

Allocate according to largest element
Can only use one field at a time

Pasted image 20251028135101.png|600

Useful when you want to have data that sometimes you want to interpret in different types or look differently depending on some condition.
Also you do not want to have extra storage for the things you are not going to store.

Using Union to Access Bit Patterns

Pasted image 20251028135350.png|550

Takes unsigned u and returns it as a float
Is this the same as just casting u as a float?
- No, this is not the floating representation, it is something else
- It is reinterpreting bits to be of different type with the same bit pattern.

Byte Ordering Revisited

Idea
- Short/long/quad words stored in memory as 2/4/8 consecutive bytes
- Which byte is most (least) significant?
- Can cause problems when exchanging binary data between machines
Big Endian
- Most significant byte has lowest address
- Sparc
Little Endian
- Least significant byte has lowest address
- Intel x86, ARM Android and IOS
Bi Endian
- Can be configured either way
- ARM

Byte Ordering Example

union {
      unsigned char c[8];
      unsigned short s[4];
      unsigned int i[2];
      unsigned long l[1];
    } dw;

Pasted image 20251028135625.png|475

Print function:

int j;
for (j = 0; j < 8; j++)
    dw.c[j] = 0xf0 + j;

printf("Characters 0-7 ==  [0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x,0x%x]\n",
    dw.c[0], dw.c[1], dw.c[2], dw.c[3],
    dw.c[4], dw.c[5], dw.c[6], dw.c[7]);

printf("Shorts 0-3 == [0x%x,0x%x,0x%x,0x%x]\n",
    dw.s[0], dw.s[1], dw.s[2], dw.s[3]);

printf("Ints 0-1 == [0x%x,0x%x]\n",
    dw.i[0], dw.i[1]);

printf("Long 0 == [0x%lx]\n",
    dw.l[0]);

Byte Ordering on IA32 (Little Endian)

Little Endian
Pasted image 20251103210713.png|425
Output:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts     0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
Ints       0-1 == [0xf3f2f1f0,0xf7f6f5f4]
Long       0   == [0xf3f2f1f0]

Byte Ordering on Sun (Big Endian)

Big Endian
Pasted image 20251103210845.png|425
Output:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts     0-3 == [0xf0f1,0xf2f3,0xf4f5,0xf6f7]
Ints       0-1 == [0xf0f1f2f3,0xf4f5f6f7]
Long       0   == [0xf0f1f2f3]

Byte Ordering on x86-64 (Little Endian 64-bit)

Little Endian
Pasted image 20251103210944.png|425
Output:

Characters 0-7 == [0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7]
Shorts     0-3 == [0xf1f0,0xf3f2,0xf5f4,0xf7f6]
Ints       0-1 == [0xf3f2f1f0,0xf7f6f5f4]
Long       0   == [0xf7f6f5f4f3f2f1f0]

Summary of Compound Types in C

Arrays
- Contiguous allocation of memory
- Aligned to satisfy every element’s alignment requirement
- Pointer to first element
- No bounds checking
Structures
- Allocate bytes in order declared
- Pad in middle and at end to satisfy alignment
Unions
- Overlay declarations
- Way to circumvent type system