Machine Language - x86 Assembly

#assembly

Class: CSCE-312

Notes:

Assembly Instructions

The `mov` Instruction:

Purpose: copy data from source to destination
Common confusion: order of operands depends on the syntax style
Intel Syntax
- Format: mov destination, source
- First operand = where to put the data
- Second operand = where the data comes from
- Example:
  - mov eax, ebx ; eax = ebx
AT&T Syntax
- Format: mov source, destination
- First operand = source
- Second operand = destination
- Example:
  - mov %ebx, %eax # eax = ebx

How to tell assembly syntax apart?

Intel syntax (NASM, MASM, manuals):
- No % or $ prefixes
- Destination comes first
AT&T syntax (GNU tools, objdump):
- Registers start with %
- Immediates start with $
- Source comes first

Instruction suffix letters (operand size)

In AT&T/GAS (GNU Assembler) syntax, x86 instructions typically include a suffix letter that specifies the operand size:

b = byte (8-bit)
w = word (16-bit)
l = long (32-bit)
q = quad (64-bit)

So for example:

movb moves 8-bit values
movw moves 16-bit values
movl moves 32-bit values
movq moves 64-bit values

Note: If suffix is not provided - for some instructions - the assembler can infer the size

Data Movement Instructions (AT&T)

mov src, dest → copy data
push src → push value on stack
pop dest → pop into register/memory
lea 8(%ebp), %eax # eax = address of local var

Example:

mov %ebx, %eax      # eax = ebx
push %eax           # push eax onto stack
pop %ecx            # pop into ecx
lea 8(%ebp), % eax  # eax = address of local var

Arithmetic Instructions (AT&T)

add src, dest → dest = dest + src
sub src, dest → dest = dest – src
inc dest → increment (+1)
dec dest → decrement (–1)
imul src, dest → signed multiply
idiv src → signed divide (result in EAX/EDX)

Example:

add %ebx, %eax      # eax += ebx
sub $1, %ecx        # ecx -= 1
imul %ecx, %eax     # eax *= ecx

Logical & Bitwise (AT&T)

and src, dest → dest = dest & src
or src, dest → dest = dest | src
xor src, dest → dest = dest ^ src
not dest → flip all bits
shl $n, dest → shift left
shr $n, dest → shift right (logical)
sar $n, dest → shift right (arithmetic)

Example:

and %ebx, %eax      # eax &= ebx
xor %eax, %eax      # eax = 0
shl $2, %ecx        # ecx << 2

Note that xor is a common way of assigning 0 to things.

Control Flow (AT&T)

jmp label → unconditional jump
cmp src, dest → compare dest – src
test src, dest → AND dest & src (sets flags)
je label → jump if equal (ZF=1)
jne label → jump if not equal (ZF=0)
jg label → jump if greater (signed)
jl label → jump if less (signed)
call label → call procedure
ret → return from procedure

Example:

cmp %ebx, %eax      # compare eax - ebx
je equal_case
jne not_equal_case

Special (AT&T)

nop → do nothing
hlt → halt CPU
int $n → software interrupt
syscall → Linux system call

Code in assembly from Scratch!

This is a Linux x86-64 assembly program that prints “Hello, World!” and exits.

Intel Syntax

hello.asm

section .data
		text db "Hello, World!",10

section .text
		global _start

_start: 
	mov rax, 1
	mov rdi, 1
	mov rsi, text
	mov rdx, 14
	syscall
	
	mov rax, 60
	mov rdi, 0
	syscall

AT&T Syntax

hello.s

	.data
text:
	.string "Hello, World!\n"
	
	.text
	.global _start

_start: 
	mov $1, %rax         # syscall number: write
	mov $1, %rdi         # file descriptor: stdout
	mov $text, %rsi      # pointer to string
	mov $14, %rdx        # string length
	syscall
	
	mov $60, %rax        # syscall number: exit
	mov $0, %rdi         # exit code
	syscall

Key differences:
- Registers start with % (e.g., %rax).
- Immediates (constants) start with $.
- Data section uses .string instead of db.

How to install and run Assembly?

What is NASM, GAS, Linker?

NASM (Netwide Assembler):
- Popular assembler for Intel syntax.
- You write .asm files → NASM turns them into .o (object) files.
GAS (GNU Assembler):
- Default assembler on Linux.
- Uses AT&T syntax.
- Often invoked automatically by gcc.
Linker (ld):
- Combines object files into an executable.
- Resolves references like _start.
- Example: ld -o hello hello.o

How to Install and Run (Linux)

Install NASM + Binutils
- sudo apt update
- sudo apt install nasm binutils
- (binutils gives you ld linker and objdump).
Assemble and Link (Intel / NASM syntax)
- nasm -f elf64 hello.asm -o hello.o
- ld hello.o -o hello
- ./hello
Assemble and Link (AT&T / GAS syntax)
- Save code as hello.s:
- as -o hello.o hello.s
- ld hello.o -o hello
- ./hello

Syscall

What is a `syscall`?

Syscall (system call) = the way a program in user space asks the kernel to do something for it.
Examples: writing to the screen, reading from files, creating processes, allocating memory.
You can’t directly touch hardware from user programs — instead, you request services from the kernel using syscalls.

How `syscall` work (x86-64 Linux)

You load the syscall number into register %rax.
You load syscall arguments into specific registers:
- %rdi → 1st argument
- %rsi → 2nd argument
- %rdx → 3rd argument
- %r10 → 4th argument
- %r8 → 5th argument
- %r9 → 6th argument
Then you run the instruction syscall.
Kernel executes the request and returns a value in %rax.

Common `syscall` numbers (x86-64 Linux)

1 → write(fd, buf, count)
0 → read(fd, buf, count)
60 → exit(status)
39 → getpid() (get process ID)
57 → fork() (create new process)

Full list lives in /usr/include/x86_64-linux-gnu/asm/unistd_64.h.

Let's take another look at the hello.s code!

Section of hello.s

	mov $1, %rax         # syscall 1 = write
	mov $1, %rdi         # fd = 1 (stdout)
	mov $text, %rsi      # pointer to string
	mov $14, %rdx        # length
	syscall              # => write(1, text, 14)
	
	mov $60, %rax        # syscall 60 = exit
	mov $0, %rdi         # exit code 0
	syscall              # => exit(0)

So literally:

Ask kernel to write 14 bytes from memory at text into file descriptor 1 (terminal).
Ask kernel to exit program with code 0.
In assembly, syscalls are the only bridge to interact with OS features.
Higher-level languages (printf, scanf, etc.) are just wrappers around these syscalls.
Understanding syscalls helps explain what’s happening under the hood when you run a program.

Examples

Jump Example (Intel)

section .txt
global _start

_start:
		mov eax, 3
		mov ebx, 2
		cmp eax, ebx
		jl lesser
		jmp end
		
lesser:
		mov ecx, 1

end: 
		mov rax, 60
		mov rdi, 0
		syscall

Step by step:

Put 3 in EAX and 2 in EBX.
cmp eax, ebx subtracts ebx from eax (without storing the result), updating flags:
- eax (3) - ebx (2) = 1 → result is positive, so "less" flag is not set.
jl lesser means jump if less (signed comparison). Here, 3 < 2 is false, so the jump is skipped.
Execution goes to jmp end, skipping the lesser label.
Program exits with code 0.
If you changed mov eax, 1 and mov ebx, 2, then eax < ebx would be true, so it would jump into lesser and set ecx = 1 before exiting.

Jump Example (AT&T - GNU Assembler)

	.text
	.global _start
	
_start:
	mov $3, %eax        # eax = 3
	mov $2, %ebx        # ebx = 2
	cmp %ebx, %eax      # compare eax - ebx
	jl lesser           # jump if less (eax < ebx)
	jmp end

lesser:
	mov $1, %ecx        # ecx = 1
	
end: 
	mov $60, %rax       # syscall: exit
	mov $0, %rdi        # exit code 0
	syscall

More Examples

This is a slightly larger program that asks the user for their name and prints a greeting.

Explanation

_start → entry point. Calls helper functions in sequence.
_printText1 → prints the prompt: “What is your name? ”.
_getName → reads up to 16 bytes from keyboard (stdin, fd=0) into buffer name.
_printText2 → prints “Hello, ”..
_printName → prints whatever the user typed.
Finally, program exits with syscall 60.

Code (AT&T syntax):

	.data
text1:
	.string "What is your name? "
text3:
	.string "Hello, "
	
	.bss
	.lcomm name, 16        # reserve 16 bytes
	
	.text
	.global _start

_start:
	call _printText1
	call _getName
	call _printText2
	call _printName
	
	mov $60, %rax          # syscanll: exit
	mov $0, %rdi
	syscall

_getName:
	mov $0, %rax           # syscall: read
	mov $0, %rdi           # fd = stdin
	mov $name, %rsi        # buffer
	mov $16, %rdx          # size
	syscall
	ret

_printText1:
	mov $1, %rax           # syscall: write
	mov $1, %rdi           # fd = stdout
	mov $text1, %rsi
	mov $19, %rdx
	syscall
	ret

_printText2:
	mov $1, %rax
	mov $1, %rdi
	mov $text3, %rsi
	mov $7, %rdx
	syscall
	ret
	
_printName:
	mov $1, %rax
	mov $1, %rdi
	mov $name, %rsi
	mov $16, %rdx
	syscall
	ret

Loop example in x86 assembly

Code (AT&T syntax)

	.data
num1:
	.long 3
num2:
	.long 6
num3:
	.long 8
	
	.text
	.global _start

_start:
	mov num1(%rip), %eax     # eax = num1 (3)
	mov num2(%rip), %ebx     # ebx = num2 (6)

_startloop:
	cmp %ebx, %eax           # compare eax - ebx
	jae _exit                # jump if eax >= ebx
	inc % eax                # eax++
	jmp _startloop

_exit:
	mov num3(%rip), %eax     # eax = num3 (8)
	
	mov $60, %eax            # syscall: exit
	mov $0, %rdi
	syscall

Explanation:

Data section:
- num1 = 3, num2 = 6, num3 = 8.
Registers used:
- eax holds num1 (loop variable).
- ebx holds num2 (loop bound).
Loop mechanics:
- Compare eax with ebx.
- If eax >= ebx, exit loop.
- Otherwise increment eax and repeat.
Exit:
- Loads num3 into eax (not really used).
- Calls syscall 60 → exit(0).

Lab Notes

The CMP Instruction and CPU Flags

The CMP instruction is the heart of decision-making in assembly.

Function: It compares two operands by performing a subtraction in the background (in this case, eax - ebx or 3 - 2).Crucial Point: It does not store the result of the subtraction. Instead, its sole purpose is to set special one-bit CPU registers called flags.

In our code, 3 - 2 = 1. The result is not zero and not negative, so the Zero Flag and Sign Flag are cleared.

Conditional Jumps

Conditional jump instructions read the CPU flags to decide whether to change the flow of execution.
- JL stands for "Jump if Less". This is a signed comparison. It checks the flags to see if the first operand was arithmetically less than the second. In our code, is eax (3) less than ebx (2)? No. Therefore, the condition is false, and the JL lesser instruction does not jump. The program simply continues to the next line.

Signed vs. Unsigned Jumps

This is a very important distinction in assembly. The note jumps are signed by default is a common point of confusion.
Signed Jumps (for numbers that can be positive or negative):
Unsigned Jumps (for numbers that are always non-negative, like memory addresses)
The code we did in lab correctly uses JL because we are comparing simple numeric values.

Unconditional Jumps (JMP) and Program Flow

The JMP instruction is an "unconditional jump" — it always redirects the program flow.
Purpose: In this code, JMP end is essential. If the JL condition is false (which it is), we need to skip over the _lesser code block. Without this JMP, the program would execute the _lesser block regardless of the comparison, which would be a logical error. This is called preventing "fall-through".

Example code (loop)

C code (loop example):

int bump(int num1, int num2) {
    for (int i = 0; num1 < num2; i++) {
        num1 = num1 + 1;
    }
    return num1;
}

Assembly version:

    .globl  bump
bump:
    # args: %edi = num1, %esi = num2
    xorl    %eax, %eax        # i = 0 (we’ll use %eax as the loop counter)
    movl    %edi, %edx        # edx = num1
    movl    %esi, %ecx        # ecx = num2

.Lloop:
    cmpl    %ecx, %edx        # compare num1 (edx) with num2 (ecx)
    jge     .Ldone            # if num1 >= num2, break
    addl    $1, %edx          # num1 = num1 + 1
    incl    %eax              # i++
    jmp     .Lloop

.Ldone:
    movl    %edx, %eax        # return num1
    ret

Some useful commands

Compile and run commands for code (intel version)
- nasm -f elf64 loop1.asm loop1.o
- ld loop1.o -o loop1
- ./loop1
GDB (lldb) commands
- info r shows values of every register
- info r eax gives value stored in eax register
- p $eax gives value stored in eax register
- stepi step to next instruction
Additional information
- Jump commands go from one instruction to another, but if you convert to “move”, you just go to the next instruction.
- “-fno-if-conversion” is a flag that tells the compiler not to convert jump statements to move statements.

Assembly Instructions

The mov Instruction:

How to tell assembly syntax apart?

Instruction suffix letters (operand size)

Data Movement Instructions (AT&T)

Arithmetic Instructions (AT&T)

Logical & Bitwise (AT&T)

Control Flow (AT&T)

Special (AT&T)

Code in assembly from Scratch!

Intel Syntax

AT&T Syntax

How to install and run Assembly?

What is NASM, GAS, Linker?

How to Install and Run (Linux)

Syscall

What is a syscall?

How syscall work (x86-64 Linux)

Common syscall numbers (x86-64 Linux)

Let's take another look at the hello.s code!

Examples

Jump Example (Intel)

Jump Example (AT&T - GNU Assembler)

More Examples

Loop example in x86 assembly

Lab Notes

The CMP Instruction and CPU Flags

Conditional Jumps

Signed vs. Unsigned Jumps

Unconditional Jumps (JMP) and Program Flow

Example code (loop)

Some useful commands

The `mov` Instruction:

What is a `syscall`?

How `syscall` work (x86-64 Linux)

Common `syscall` numbers (x86-64 Linux)