Machine-Level Programming I - Basics
Class: CSCE-312
Notes:
Today: Machine Programming I: Basics
- History of Intel processors and architectures
- C, assembly, machine code
- Assembly Basics: Registers, operands, move
- Arithmetic & logical operations
History of Intel processors and architectures
Intel x86 Processors
-
Dominate laptop/desktop/server market
-
Evolutionary design
- Backwards compatible up until 8086, introduced in 1978
- Added more features as time goes on
-
Complex instruction set computer (CISC)
- Many different instructions with many different formats
- But, only small subset encountered with Linux programs
- Many different instructions to do the same thing
- Or same semantics but do different things
- Hard to match performance of Reduced Instruction Set Computers (RISC)
- Fun fact: RISC-V is an emerging technology architecture being included by many startups
- Good idea for other reasons: it is much easier to compile for a simpler instruction set, than for a complex instruction set
- But, Intel has done just that!
- In terms of speed. Less so for low power.
- Intel matched performance of RISC technology
- They did a beautiful work
- They run complex instructions very very fast
- In terms of speed Intel is better, in terms of power, they are very bad
- Intel offers background compatibility
- They own their instruction set
- Many different instructions with many different formats
Intel x86 Evolution: Milestones
Name Date Transistors MHz
-
8086 1978 29K 5-10
- First 16-bit Intel processor. Basis for IBM PC & DOS
- About 65 thousand bits
- Address more than 64 kb was the goal
- Pointers were two dimensional: made programming a hard thing
- 1MB address space
- Just figuring out how to make microprocessors (evolved from video game systems)
- 8286 was very popular, allowed you to have isolation between process and allowed you to use disk as a backup memory, cool stuff.
- First 16-bit Intel processor. Basis for IBM PC & DOS
-
386 1985 275K 16-33
- First 32 bit Intel processor , referred to as IA32
- 10 times as many transistors
- Gives us nice 4GB address space
- Introduced a lot of new instructions to the instruction set
- Added “flat addressing”, capable of running Unix
- Pointers became a single number that can point to any address
- It was really hard to program before
- It made it possible for Linus Torvalds to invent Linux
- 1991~. Started writing a terminal emulator and ended up writing an operative system.
- Professor actually emailed Linus Torvalds about a bug, and he responded!
- Up to 3MB address space (professor experience)
- 486 actually had a little cashed, a deeper pipeline, and allowed to be a little faster
- First 32 bit Intel processor , referred to as IA32
-
Pentium 4E 2004 125M 2800-3800
- First 64-bit Intel x86 processor, referred to as x86-64
- They wanted a name that they can trademark.
- Pentium 4 was an ambitious project, with a pipeline that was
- Pipeline: each stage does something related to executing instructions
- ...
- Recon
- Execute
- Memory
- Write back
- ...
- Gives you parallelism
- Pipeline: each stage does something related to executing instructions
-
Core 2 2006 291M 1060-3500
- First multi-core Intel processor
- Less ambitious pipeline
- Increasing the number of transistors exponentially
-
Core i7 2008 731M 1700-3900
- Four cores (our shark machines)
- Reaching almost 4 GHz
Intel x86 Processors, cont.
-
Machine Evolution
- 386 1985 0.3M
- Pentium 1993 3.1M
- Pentium/MMX 1997 4.5M
- PentiumPro 1995 6.5M
- Pentium III 1999 8.2M
- Pentium 4 2001 42M
- Core 2 Duo 2006 291M
- Core i7 2008 731M
-
Added Features
- Instructions to support multimedia operations
- Do vector operations on not just integers but also floats
- Instructions to enable more efficient conditional operations
- Transition from 32 bits to 64 bits
- So that we can have a more comfortable reasonable address space
- More cores
- Instructions to support multimedia operations
/CSCE-312/Visual%20Aids/Pasted%20image%2020250916125234.png)
- Core i7 photo
- l1 chache: Within these cores there is an l1 cache for data, another one for instructions, etc.
- l2 cache is a little slower but still very fast
- l3 cache, if you do not find something in the l2 cache you still have the l3 cache to check
- Allows the cores to communicate between each others
- Parallel processing occurs this way
2015 State of the Art
-
Core i7 Broadwell 2015
-
Desktop Model
- 4 cores
- Integrated graphics
- 3.3-3.8 GHz
- 65W
-
Server Model
- 8 cores
- Integrated I/O
- 2-2.6 GHz
- 45W
x86 Clones: Advanced Micro Devices (AMD)
- Historically
- AMD has followed just behind Intel
- A little bit slower, a lot cheaper
- Pushes Intel to innovate by being a very strong competitor
- Sometimes their products are better and cheaper, sometimes not reliable, but they get more and more reliable
- Itanium = super powerful computer back in those days
- Incompatible with x86
- AMD extended the old intel x86 instruction set to handle 64 bits
- Then
- Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
- Built Opteron: tough competitor to Pentium 4
- Developed x86-64, their own extension to 64 bits
- You can run your 32-bit programs without changing anything, and whenever you ready change to a 64-bit program.
- This was designed by AMD and works with both intel and AMD processors
- Recent Years
- Intel got its act together
- Leads the world in semiconductor technology
- AMD has fallen behind
- Relies on external semiconductor manufacturer
- Intel got its act together
Intel's 64-Bit History
...
C, assembly, machine code
Definitions
-
Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand or write assembly/machine code.
- It is the interface, the instructions themselves, machine language encodings, addressing, which registers are available, how is memory accessed, that is all in the ISA.
- It is not updated very often, because they would have to support it in many different interfaces
- Examples: instruction set specification, registers.
-
Microarchitecture: Implementation of the architecture.
- This is the circuits, transistors, the algorithms to implement this architecture.
- Examples: cache sizes and core frequency.
- How big should the cache, what to keep, and what to throw away.
- How fast can we clock this, depends on the microarchitecture.
- How to make the processor more efficiently
-
Code Forms:
- Machine Code: The byte-level programs that a processor executes
- Very hard to read and understand
- Assembly Code: A text representation of machine code
- Text level representation of machine language
- More or less english looking instructions
- Machine Code: The byte-level programs that a processor executes
-
Example ISAs:
- Intel: x86, IA32, Itanium, x86-64 (AMD)
- ARM: Used in almost all mobile phones
- There is RISC5, PowerPC, IBM360, Motorolla 6800 (Apple PC) etc...
Assembly/Machine Code View
/CSCE-312/Visual%20Aids/Machine_Code_View.png)
Programmer-Visible State
- PC: Program counter
- Address of next instruction
- Contains the address in the next instruction in memory that should be executed
- Usually: "get the next few bytes"
- Assembles an isntruction
- Contains the address in the next instruction in memory that should be executed
- Called “RIP” (x86-64)
- Address of next instruction
- Register file
- Heavily used program data
- Old usage of the word file: just means "an array of things"
- Contains the registers rax, rei, rxp, etc...
- Condition codes
- Store status information about most recent arithmetic or logical operation
- Used for conditional branching
- Like registers but they are 1 bit, they just store conditions
- Enable control flow in our program
- Memory
- Byte addressable array
- Memory gives back instructions given at a requested address
- Code and user data
- Stack to support procedures
- Byte addressable array
Turning C into Object Code
- Code in files p1.c p2.c
- Compile with command:
gcc –Og p1.c p2.c -o p- Use basic optimizations (-Og)
[New to recent versions of GCC] - Put resulting binary in file
p
- Use basic optimizations (-Og)
- Object code
- Machine language stored in a file to be used
/CSCE-312/Visual%20Aids/From_C_to_Object_Code.png)
- Take programs
- Compile them into
.sfiles (assembly files) - Assembler
- Command line tool that will assemble programs into object code.
- Converts
.sfiles into.ofiles (object files) - Not yet ready to be executed yet
- Link them together into the executable program
p- Takes all machine files and links them together, fills in all the missing information
- Libraries are included here
- Will link the
.ofiles
- Executable program
- We call
pa binary - Can be run straight from the command line
- We call
Compiling into Assembly
C Code (sum.c)
long plus(long x, long y);
void sumstore(long x, long y,
long *dest)
{
long t = plus(x, y);
*dest = t;
}
- Needlessly complicated but doing it to illustrate how things work
Obtain with command
gcc –O –S sum.c
- Compile code into machine language code and get the assembly file
- Think of a pipe in Linux, the compiler does this with the next phase of the compiling process, it just passes the output to the next phase.
- Produces file
sum.s
Generated x86-64 Assembly
sumstore:
pushq %rbx
movq %rdx, %rbx
call plus
movq %rax, (%rbx)
popq %rbx
ret
-
rbxis the call safe register- Whoever calls this register expects to have the same value that had the last time we call it
-
Preserve the value of
rbx -
Remember the value of
rdxintorbx -
Calls
plus, adds two parameters and returns result- Will place that sum on the
raxregister
- Will place that sum on the
-
Take the value returned by
plusand stick it into the place in memory whererbxis pointing -
retjust means return
**Warning****: Will get very different results on other machines due to different versions of gcc and different compiler settings.
Assembly Characteristics: Data Types
-
“Integer” data of 1, 2, 4, or 8 bytes
- 4 main type of different "Int" data types
- Data values
wmeans word (2-bytes)lmeans long word (4-bytes)qmeans quad word (8-bytes)
- Addresses (untyped pointers)
- x86 helps you here, while RISC5 instructions set usually don't
-
Floating point data of 4, 8, or 10 bytes
-
Code: Byte sequences encoding series of instructions
- Some instructions are shorter, others are longer
- In RISC5, all instructions are the same size
- x86 has dynamic instruction sizes, does not waste that much
-
No aggregate types such as arrays or structures
- Just contiguously allocated bytes in memory
- There is no structs, class, etc.
- You have to write everything yourself or with help of the compiler.
- There is no way to name fields like you do in C/C++
Assembly Characteristics: Operations
-
Perform arithmetic function on register or memory data
- With x86 you can do accessing memory and arithmetic in one single instruction
- Makes it more convenient and easier to program
- With x86 you can do accessing memory and arithmetic in one single instruction
-
Transfer data between memory and register
- Load data from memory into register
- Store register data into memory
- You can't say I want to take this "memory and put this into this memory" - this is not how it works
-
Transfer control
- Unconditional jumps to/from procedures
- Jump to another part of the program or function
- Conditional branches
- The word "jump" and "branch" in computer architecture means the same exact thing
- Synonyms that we use and switch back and forth
- We do a "jump" if some condition is true.
- Think of if statements.
- Unconditional jumps to/from procedures
Object Code
Code for sumstore
0x0400595:
0x53
0x48
0x89
0xd3
0xe8
0xf2
0xff
0xff
0xff
0x48
0x89
0x03
0x5b
0xc3
-
Total of 14 bytes
-
Each instruction 1, 3, or 5 bytes
-
Starts at address 0x0400595
-
Assembler
- Translates
.sinto.o - Binary encoding of each instruction
- Nearly-complete image of executable code
- Missing linkages between code in different files
- It gives an executable that is mostly complete, but still it is missing some stuff, the dynamic linker fills in all of these gaps.
- Translates
-
Linker
- Resolves references between files
- Combines with static run-time libraries
- E.g., code for
malloc,printf
- E.g., code for
- Some libraries are dynamically linked
- Linking occurs when program begins execution
Machine Instruction Example
C Code
*dest = t:
- Store value
twhere designated bydest
Assembly
movq %rax, (%rbx)
- Move 8-byte value to memory
- Quad words in x86-64 parlance
- Operands:
t: Register%raxdest: Register%rbx*dest: MemoryM[%rbx]
- Why the parenthesis on
(%rbx)- Mean "in direction"
- Means not
rbxitself but whatrbxpoints to- The type pointed to is on the
qin themovqinstruction
- The type pointed to is on the
Object Code
0x40059e: 48 89 03
- 3-byte instruction
- Stored at address 0x40059e
- If we looked at a similar instruction, it will differ in just a few bits that would be the differences in the instructions.
- Some of these bytes mean mov1, some of them mean rax, rbx, and some of them mean move in direction (parenthesis, etc...)
- All of that is encoded in that 3-byte sequence
- In ARM or RISC5 it will take 4 bytes
- In x86 instructions can be from 1 to very long, they usually give you less than 4 bytes (saves space)
- We can fit more programs into the cache
- All of that is encoded in that 3-byte sequence
Disassembling Object Code
Disassembled
0000000000400595 <sumstore>:
400595: 53 push %rbx
400596: 48 89 d3 mov %rdx,%rbx
400599: e8 f2 ff ff ff callq 400590 <plus>
40059e: 48 89 03 mov %rax,(%rbx)
4005a1: 5b pop %rbx
4005a2: c3 retq
- You can figure out from the context wether there should be a
qor not in front of instructions - In
retqsomebody made the opposite decision.
Disassembler
objdump –d sum
- Useful tool for examining object code
- Analyzes bit pattern of series of instructions
- Produces approximate rendition of assembly code
- Reads file and disassemble the executable portions of your code and shows it to you in assembly language
- It will not have the labels of your variables and stuff, but it will be something you can more or less read
- Can be run on either
a.out(complete executable) or.ofile- Recommendation:
gdb/ldbdebugger gives you an interactive environment where you can run your program and debug it using breakpoints, and other tools. - Lets you examine data
- This debugger also includes a disassembler.
- Instead of showing you C code, it will show you the assembly language it disassembles.
- This is where you get to see how your code is being interpreted by the assembler.
- Recommendation:
Alternate Disassembly
Object
0x0400595:
0x53
0x48
0x89
0xd3
0xe8
0xf2
0xff
0xff
0xff
0x48
0x89
0x03
0x5b
0xc3
Dissasembly
Dump of assembler code for function sumstore:
0x0000000000400595 <+0>: push %rbx
0x0000000000400596 <+1>: mov %rdx,%rbx
0x0000000000400599 <+4>: callq 0x400590 <plus>
0x000000000040059e <+9>: mov %rax,(%rbx)
0x00000000004005a1 <+12>:pop %rbx
0x00000000004005a2 <+13>:retq
- Within
gdbDebuggergdb sumdisassemble sumstore- Disassemble procedure
- Will give you the disassembled code
x/14xb sumstore- Examine the 14 bytes starting at sumstore
What Can be Disassembled?
% objdump -d WINWORD.EXE
WINWORD.EXE: file format pei-i386
No symbols in "WINWORD.EXE".
Disassembly of section .text:
30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91
- Anything that can be interpreted as executable code
- Disassembler examines bytes and reconstructs assembly source
Reverse engineering forbidden by Microsoft End User License Agreement
- Can you disassemble Microsoft Word?
- Yes you can if you have access to the executable
- But you are not supposed to if you agree to the liscsence agreement.
- May be needed when finding vulnerabilities
- Check: Reverse Engineering courses
- Disassembly process is used often here
Assembly Basics: Registers, operands, move
x86-64 Integer Registers
/CSCE-312/Visual%20Aids/Pasted%20image%2020250918124904.png)
- Can reference low-order 4 bytes (also low-order 1 & 2 bytes)
- Example:
raxis the 64-bit version of the 32-biteaxregister- You can still use
eaxfor 32-bit values but if you want you can useraxfor longer values
- You can still use
ax- 16-bytesah- higher order byteal- lower order byte
- Names don't mean anything except for a couple
rsp= stack pointer- Structure in memory used for passing parameters and know when to return procedures
- Parameters can be passed on the stack, local variables are also stored in the stack and removed when they are no longer used.
- Needs to be pointing to some memory that represents the top of the stack
rbp= frame pointer- Points to the beginning of the local variable storage for the current procedure (the procedure you are now)
- Store things like local variables or temporary variables
- Everything is always at a constant offset from the frame pointer
Some History: IA32 Registers
/CSCE-312/Visual%20Aids/Pasted%20image%2020250918130142.png)
- These are the 32-bit versions of the registers
- Note: most instructions can use any register, seen later
- Where do names come from:
ax: accumulatecx: counterdx: databx: base (for an array)- ...
Moving Data
-
Moving Data
movq <Source>, <Dest>:- Moves data
- Very general instruction
- Coding is different for the different kind of things that it can do
-
Operand Types (expression after the name of the instruction)
- Immediate: Constant integer data
- Example:
$0x400,$-533 - Like C constant, but prefixed with ‘
$’ - Encoded with 1, 2, or 4 bytes
- If a constant doesn't fit in 4-bytes, as many do, the assembler has to split it up into 2 instructions.
- Example:
- Register: One of 16 integer registers
- Example:
%rax,%r13 - But
%rspreserved for special use- You can use it as a source or destination, but it comes with side effects. You have to be aware of what exactly
%rspis doing
- You can use it as a source or destination, but it comes with side effects. You have to be aware of what exactly
- Others have special uses for particular instructions
- Example:
- Memory: 8 consecutive bytes of memory at address given by register
- Simplest example:
(%rax)- Instead of consider the value of
rax, consider the memory pointed by it
- Instead of consider the value of
- Various other “address modes”
- Means: "use this register as an address"
- Simplest example:
- Immediate: Constant integer data
/CSCE-312/Visual%20Aids/Pasted%20image%2020250918133345.png)
movq Operand Combinations
/CSCE-312/Visual%20Aids/Pasted%20image%2020250918133948.png)
Cannot do memory-memory transfer with a single instruction
Source
- Immediate
- Register
- Memory
Destination
-
Regular
-
Memory
-
Versatile useful instructions
-
Basically just moving data into memory and out of memory
-
We have a limited number of registers so we need a tool to interact with memory efficiently
Simple Memory Addressing Modes
-
Normal (R)
Mem[Reg[R]]- Register R specifies memory address
- Aha! Pointer dereferencing in C
movq (%rcx),%rax- Move what
%rcxpoints to into%rax - A "loading" isntruction
- Move what
-
Displacement D(R)
Mem[Reg[R]+D]- Register R specifies start of memory region
- Constant displacement D specifies offset
- Means: "memory at register + constant"
- Example + 8
- 8 bytes beyond rbp
- "Give me the second element of that array"
- You can also use it if
rbppoints to a struct and you want 8 bytes into the struct (element that is 8-bytes pass) - In general is used for offsets in structs or objects
movq 8 (%rbp),%rdx
-
rbppoitns to local variables -
Layed out in order
-
This is saying: get the second local variable and put it into
rdx -
Can be read as: Address = 8 +
%rbp, move to%rdx
Example of Simple Addressing Modes
C Code
void swap
(long *xp, long *yp)
{
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
Assembly
swap:
movq (%rdi), %rax
movq (%rsi), %rdx
movq %rdx, (%rdi)
movq %rax, (%rsi)
ret
- There is a one to one correspondence between these statements and the ones in the C code
Understanding swap()
void swap
(long *xp, long *yp)
{
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
/CSCE-312/Visual%20Aids/Pasted%20image%2020250925130015.png)
Register Value
%rdi xp
%rsi yp
%rax t0
%rdx t1
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
- This should be very simple
- Is the same thing we are doing in C, jus tin a different and particular language
- In this case there is a one-to-one correspondence in the number of statements
- In assembly you have to explicitly indicate
retto return - What if we swapped the first two
movqstatements? (make the first happen second, and the second happen first)- This would not change the result, the order was different but at the end of the procedure you can't tell the difference
- This means we can do them in parallel, both of them at the same time
- The processor can figure this out and schedule these instructions to be processed at the same time.
...
Complete Memory Addressing Modes
-
Most General Form
D(Rb,Ri,S)Mem[Reg[Rb]+S*Reg[Ri]+ D]- D: Constant “displacement” 1, 2, or 4 bytes
- Rb: Base register: Any of 16 integer registers
- Ri: Index register: Any, except for
%rsp - S: Scale: 1, 2, 4, or 8 (why these numbers?)
-
Special Cases
- Special cases where we omit something (the scale, etc.)
(Rb,Ri)Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri)Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S)Mem[Reg[Rb]+S*Reg[Ri]]
- Special cases where we omit something (the scale, etc.)
Address Computation Examples
| %rdx | 0xf000 |
|---|---|
| %rcx | 0x0100 |
| /CSCE-312/Visual%20Aids/Pasted%20image%2020250918135311.png) |
- 4 bytes each
0x80(,%rdx,2)- Ommiting base register
...
- Ommiting base register
Arithmetic & logical operations
Address Computation Instruction
-
leaq <Src>, <Dst>- Src is address mode expression
- Set Dst to address denoted by expression
- Rather than moving the data, it computes the address
- You can use it before you actually store something into an address
- Then puts this address into the destination register
-
Uses
- Computing addresses without a memory reference
- E.g., translation of
p = &x[i];
- E.g., translation of
- Computing arithmetic expressions of the form
x + k * y- k = 1, 2, 4, or 8
- You can have that displacement if you like
- Computing addresses without a memory reference
Example
long m12(long x)
{
return x*12;
}
- Note: return values are always into
%rax
Converted to ASM by compiler:
leaq (%rdi,%rdi,2), %rax # t <- x+x*2
salq $2, %rax # return t<<2
- Example: combine with shift left, easily lets you multiply by 12
- Multiply
rdi* 3 and put result back inrax
- Multiply
- Was designed to compute addresses, not really arithmetic
- Condition codes remain whatever they were before
- This is useful if we want to remember them from a previous instruction
- A little bit more efficient, an instruction time to complete before you ask it what it did.
- This is called instruction scheduling
Some Arithmetic Operations
Two Operand Instructions:
Format Computation
addq Src,Dest Dest = Dest + Src
subq Src,Dest Dest = Dest − Src
imulq Src,Dest Dest = Dest * Src
salq Src,Dest Dest = Dest << Src # Also called shlq
sarq Src,Dest Dest = Dest >> Src # Arithmetic (signed)
shrq Src,Dest Dest = Dest >> Src # Logical (unsigned)
xorq Src,Dest Dest = Dest ^ Src
andq Src,Dest Dest = Dest & Src
orq Src,Dest Dest = Dest | Src
- Watch out for argument order!
- Example:
subq Src,Dest- You are subtracting
SrcfromDest, not the other way around
- You are subtracting
- The Destination is always the thing that is getting modified.
- Sometimes operations are not commutative
- e.i. subtraction, shifting, etc.
- Example:
- No distinction between signed and unsigned int (why?)
- All of these algorithms produce the same bitwise representation at the end.
One Operand Instructions
incq Dest Dest = Dest + 1
decq Dest Dest = Dest − 1
negq Dest Dest = − Dest (takes two's complement)
notq Dest Dest = ~Dest (takes one's complement)
- See book for more instructions
- Kind of redundant instructions, but it is a common idiom in coding to add 1 to things.
- A nice but unnecessary thing to have
- Make programs simpler
Arithmetic Expression Example
C code:
long arith
(long x, long y, long z)
{
long t1 = x+y;
long t2 = z+t1;
long t3 = x+4;
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 * t5;
return rval;
}
Assembly code:
arith:
leaq (%rdi,%rsi), %rax # t1
addq %rdx, %rax # t2
leaq (%rsi,%rsi,2), %rdx # (y * 3)
salq $4, %rdx # t4: (y * 3) * 16
leaq 4(%rdi,%rdx), %rcx # t5: t3 + t4
imulq %rcx, %rax # rval
ret
- Tip: you can think of
leaqas having 3 operands. t2= result of the previous sum +%rdx- Which register holds
t2?- Answer:
%rax
- Answer:
- The compiler has allocated
raxto bet1, and then again to uset2, why? isn't that a bug?- No,
t1is no longer going to be used, so we can deallocated.
- No,
- Which register holds
leaq (%rsi,%rsi,2), %rdx.- Why is it multpliying y times 3?
- First it multiples by 3, then
salq $4, %rdxshifts that result left by 4 positions, which multiplyis it by 16 - So we get
%rdx = y * 3 * 16which is the same asy * 48.
- Now, which register is allocated to
t3?- It is not allocated to any register at all, the compiler just knows
x+4is a value - You do no need a register, we can just remember this value when doing a computation.
- It is not allocated to any register at all, the compiler just knows
leaq 4(%rdi,%rdx), %rcx%rcx = x + 4 + t4
imulq %rcx, %rax- Take
t2 * t5and put the result intorax - Multiplication is commutative but we would like our result to go into
%raxbecause we want to return that value.
- Take
Interesting Instructions
leaq: address computationsalq: shiftimulq: multiplication- But, only used once
Understanding Arithmetic Expression Example
| Register | Use(s) |
|---|---|
%rdi |
Argument x |
%rsi |
Argument y |
%rdx |
Argument z |
%rax |
t1, t2, rval |
%rdx |
t4 |
%rcx |
t5 |
Machine Programming I: Summary
- History of Intel processors and architectures
- Evolutionary design leads to many quirks and artifacts
- C, assembly, machine code
- New forms of visible state: program counter, registers, ...
- Compiler must transform statements, expressions, procedures into low-level instruction sequences
- Assembly Basics: Registers, operands, move
- The x86-64 move instructions cover wide range of data movement forms
- Arithmetic
- C compiler will figure out different instruction combinations to carry out computation