Machine Language - x86 Assembly
Class: CSCE-312
Notes:
Assembly Instructions
The mov Instruction:
-
Purpose: copy data from source to destination
-
Common confusion: order of operands depends on the syntax style
-
Intel Syntax
- Format: mov destination, source
- First operand = where to put the data
- Second operand = where the data comes from
- Example:
mov eax, ebx ; eax = ebx
-
AT&T Syntax
- Format: mov source, destination
- First operand = source
- Second operand = destination
- Example:
mov %ebx, %eax # eax = ebx
How to tell assembly syntax apart?
- Intel syntax (NASM, MASM, manuals):
- No
%or$prefixes - Destination comes first
- No
- AT&T syntax (GNU tools, objdump):
- Registers start with
% - Immediates start with
$ - Source comes first
- Registers start with
Instruction suffix letters (operand size)
In AT&T/GAS (GNU Assembler) syntax, x86 instructions typically include a suffix letter that specifies the operand size:
b= byte (8-bit)w= word (16-bit)l= long (32-bit)q= quad (64-bit)
So for example:
movbmoves 8-bit valuesmovwmoves 16-bit valuesmovlmoves 32-bit valuesmovqmoves 64-bit values
Note: If suffix is not provided - for some instructions - the assembler can infer the size
Data Movement Instructions (AT&T)
- mov src, dest → copy data
- push src → push value on stack
- pop dest → pop into register/memory
- lea 8(%ebp), %eax # eax = address of local var
Example:
mov %ebx, %eax # eax = ebx
push %eax # push eax onto stack
pop %ecx # pop into ecx
lea 8(%ebp), % eax # eax = address of local var
Arithmetic Instructions (AT&T)
- add src, dest → dest = dest + src
- sub src, dest → dest = dest – src
- inc dest → increment (+1)
- dec dest → decrement (–1)
- imul src, dest → signed multiply
- idiv src → signed divide (result in EAX/EDX)
Example:
add %ebx, %eax # eax += ebx
sub $1, %ecx # ecx -= 1
imul %ecx, %eax # eax *= ecx
Logical & Bitwise (AT&T)
- and src, dest → dest = dest & src
- or src, dest → dest = dest | src
- xor src, dest → dest = dest ^ src
- not dest → flip all bits
- shl $n, dest → shift left
- shr $n, dest → shift right (logical)
- sar $n, dest → shift right (arithmetic)
Example:
and %ebx, %eax # eax &= ebx
xor %eax, %eax # eax = 0
shl $2, %ecx # ecx << 2
- Note that
xoris a common way of assigning0to things.
Control Flow (AT&T)
- jmp label → unconditional jump
- cmp src, dest → compare dest – src
- test src, dest → AND dest & src (sets flags)
- je label → jump if equal (ZF=1)
- jne label → jump if not equal (ZF=0)
- jg label → jump if greater (signed)
- jl label → jump if less (signed)
- call label → call procedure
- ret → return from procedure
Example:
cmp %ebx, %eax # compare eax - ebx
je equal_case
jne not_equal_case
Special (AT&T)
- nop → do nothing
- hlt → halt CPU
- int $n → software interrupt
- syscall → Linux system call
Code in assembly from Scratch!
This is a Linux x86-64 assembly program that prints “Hello, World!” and exits.
Intel Syntax
hello.asm
section .data
text db "Hello, World!",10
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, text
mov rdx, 14
syscall
mov rax, 60
mov rdi, 0
syscall
AT&T Syntax
hello.s
.data
text:
.string "Hello, World!\n"
.text
.global _start
_start:
mov $1, %rax # syscall number: write
mov $1, %rdi # file descriptor: stdout
mov $text, %rsi # pointer to string
mov $14, %rdx # string length
syscall
mov $60, %rax # syscall number: exit
mov $0, %rdi # exit code
syscall
- Key differences:
- Registers start with
%(e.g.,%rax). - Immediates (constants) start with
$. - Data section uses
.stringinstead ofdb.
- Registers start with
How to install and run Assembly?
What is NASM, GAS, Linker?
- NASM (Netwide Assembler):
- Popular assembler for Intel syntax.
- You write
.asmfiles → NASM turns them into.o(object) files.
- GAS (GNU Assembler):
- Default assembler on Linux.
- Uses AT&T syntax.
- Often invoked automatically by gcc.
- Linker (ld):
- Combines object files into an executable.
- Resolves references like _start.
- Example:
ld -o hello hello.o
How to Install and Run (Linux)
-
Install NASM + Binutils
sudo apt updatesudo apt install nasm binutils- (binutils gives you ld linker and objdump).
-
Assemble and Link (Intel / NASM syntax)
nasm -f elf64 hello.asm -o hello.old hello.o -o hello./hello
-
Assemble and Link (AT&T / GAS syntax)
- Save code as hello.s:
as -o hello.o hello.sld hello.o -o hello./hello
Syscall
What is a syscall?
- Syscall (system call) = the way a program in user space asks the kernel to do something for it.
- Examples: writing to the screen, reading from files, creating processes, allocating memory.
- You can’t directly touch hardware from user programs — instead, you request services from the kernel using syscalls.
How syscall work (x86-64 Linux)
- You load the syscall number into register
%rax. - You load syscall arguments into specific registers:
%rdi→ 1st argument%rsi→ 2nd argument%rdx→ 3rd argument%r10→ 4th argument%r8→ 5th argument%r9→ 6th argument
- Then you run the instruction
syscall. - Kernel executes the request and returns a value in
%rax.
Common syscall numbers (x86-64 Linux)
- 1 → write(fd, buf, count)
- 0 → read(fd, buf, count)
- 60 → exit(status)
- 39 → getpid() (get process ID)
- 57 → fork() (create new process)
Full list lives in /usr/include/x86_64-linux-gnu/asm/unistd_64.h.
Let's take another look at the hello.s code!
Section of hello.s
mov $1, %rax # syscall 1 = write
mov $1, %rdi # fd = 1 (stdout)
mov $text, %rsi # pointer to string
mov $14, %rdx # length
syscall # => write(1, text, 14)
mov $60, %rax # syscall 60 = exit
mov $0, %rdi # exit code 0
syscall # => exit(0)
So literally:
-
Ask kernel to write 14 bytes from memory at text into file descriptor 1 (terminal).
-
Ask kernel to exit program with code 0.
-
In assembly, syscalls are the only bridge to interact with OS features.
-
Higher-level languages (printf, scanf, etc.) are just wrappers around these syscalls.
-
Understanding syscalls helps explain what’s happening under the hood when you run a program.
Examples
Jump Example (Intel)
section .txt
global _start
_start:
mov eax, 3
mov ebx, 2
cmp eax, ebx
jl lesser
jmp end
lesser:
mov ecx, 1
end:
mov rax, 60
mov rdi, 0
syscall
Step by step:
- Put 3 in EAX and 2 in EBX.
cmp eax, ebxsubtracts ebx from eax (without storing the result), updating flags:- eax (3) - ebx (2) = 1 → result is positive, so "less" flag is not set.
jl lessermeans jump if less (signed comparison). Here, 3 < 2 is false, so the jump is skipped.- Execution goes to
jmp end, skipping the lesser label. - Program exits with code 0.
- If you changed
mov eax, 1andmov ebx, 2, then eax < ebx would be true, so it would jump into lesser and set ecx = 1 before exiting.
Jump Example (AT&T - GNU Assembler)
.text
.global _start
_start:
mov $3, %eax # eax = 3
mov $2, %ebx # ebx = 2
cmp %ebx, %eax # compare eax - ebx
jl lesser # jump if less (eax < ebx)
jmp end
lesser:
mov $1, %ecx # ecx = 1
end:
mov $60, %rax # syscall: exit
mov $0, %rdi # exit code 0
syscall
More Examples
This is a slightly larger program that asks the user for their name and prints a greeting.
Explanation
_start→ entry point. Calls helper functions in sequence._printText1→ prints the prompt: “What is your name? ”._getName→ reads up to 16 bytes from keyboard (stdin, fd=0) into buffer name._printText2→ prints “Hello, ”.._printName→ prints whatever the user typed.- Finally, program exits with syscall 60.
Code (AT&T syntax):
.data
text1:
.string "What is your name? "
text3:
.string "Hello, "
.bss
.lcomm name, 16 # reserve 16 bytes
.text
.global _start
_start:
call _printText1
call _getName
call _printText2
call _printName
mov $60, %rax # syscanll: exit
mov $0, %rdi
syscall
_getName:
mov $0, %rax # syscall: read
mov $0, %rdi # fd = stdin
mov $name, %rsi # buffer
mov $16, %rdx # size
syscall
ret
_printText1:
mov $1, %rax # syscall: write
mov $1, %rdi # fd = stdout
mov $text1, %rsi
mov $19, %rdx
syscall
ret
_printText2:
mov $1, %rax
mov $1, %rdi
mov $text3, %rsi
mov $7, %rdx
syscall
ret
_printName:
mov $1, %rax
mov $1, %rdi
mov $name, %rsi
mov $16, %rdx
syscall
ret
Loop example in x86 assembly
Code (AT&T syntax)
.data
num1:
.long 3
num2:
.long 6
num3:
.long 8
.text
.global _start
_start:
mov num1(%rip), %eax # eax = num1 (3)
mov num2(%rip), %ebx # ebx = num2 (6)
_startloop:
cmp %ebx, %eax # compare eax - ebx
jae _exit # jump if eax >= ebx
inc % eax # eax++
jmp _startloop
_exit:
mov num3(%rip), %eax # eax = num3 (8)
mov $60, %eax # syscall: exit
mov $0, %rdi
syscall
Explanation:
- Data section:
- num1 = 3, num2 = 6, num3 = 8.
- Registers used:
- eax holds num1 (loop variable).
- ebx holds num2 (loop bound).
- Loop mechanics:
- Compare eax with ebx.
- If eax >= ebx, exit loop.
- Otherwise increment eax and repeat.
- Exit:
- Loads num3 into eax (not really used).
- Calls syscall 60 → exit(0).
Lab Notes
The CMP Instruction and CPU Flags
- The CMP instruction is the heart of decision-making in assembly.
-
Function: It compares two operands by performing a subtraction in the background (in this case, eax - ebx or 3 - 2).Crucial Point: It does not store the result of the subtraction. Instead, its sole purpose is to set special one-bit CPU registers called flags.
In our code, 3 - 2 = 1. The result is not zero and not negative, so the Zero Flag and Sign Flag are cleared.
Conditional Jumps
- Conditional jump instructions read the CPU flags to decide whether to change the flow of execution.
- JL stands for "Jump if Less". This is a signed comparison. It checks the flags to see if the first operand was arithmetically less than the second. In our code, is eax (3) less than ebx (2)? No. Therefore, the condition is false, and the JL lesser instruction does not jump. The program simply continues to the next line.
Signed vs. Unsigned Jumps
- This is a very important distinction in assembly. The note jumps are signed by default is a common point of confusion.
- Signed Jumps (for numbers that can be positive or negative):
- Unsigned Jumps (for numbers that are always non-negative, like memory addresses)
- The code we did in lab correctly uses JL because we are comparing simple numeric values.
Unconditional Jumps (JMP) and Program Flow
- The JMP instruction is an "unconditional jump" — it always redirects the program flow.
- Purpose: In this code, JMP end is essential. If the JL condition is false (which it is), we need to skip over the
_lessercode block. Without this JMP, the program would execute the_lesserblock regardless of the comparison, which would be a logical error. This is called preventing "fall-through".
Example code (loop)
C code (loop example):
int bump(int num1, int num2) {
for (int i = 0; num1 < num2; i++) {
num1 = num1 + 1;
}
return num1;
}
Assembly version:
.globl bump
bump:
# args: %edi = num1, %esi = num2
xorl %eax, %eax # i = 0 (we’ll use %eax as the loop counter)
movl %edi, %edx # edx = num1
movl %esi, %ecx # ecx = num2
.Lloop:
cmpl %ecx, %edx # compare num1 (edx) with num2 (ecx)
jge .Ldone # if num1 >= num2, break
addl $1, %edx # num1 = num1 + 1
incl %eax # i++
jmp .Lloop
.Ldone:
movl %edx, %eax # return num1
ret
Some useful commands
-
Compile and run commands for code (intel version)
nasm -f elf64 loop1.asm loop1.old loop1.o -o loop1./loop1
-
GDB (lldb) commands
info rshows values of every registerinfo r eaxgives value stored in eax registerp $eaxgives value stored in eax registerstepistep to next instruction
-
Additional information
- Jump commands go from one instruction to another, but if you convert to “move”, you just go to the next instruction.
- “
-fno-if-conversion” is a flag that tells the compiler not to convert jump statements to move statements.