Machine-Level Programming II - Control
Class: CSCE-312
Notes:
Today
- Control: Condition codes
- Conditional branches
- Loops
- Switch Statements
Control: Condition codes
Processor State (x86-64, Partial)
- Information about currently executing program
- Temporary data
(
%rax, … ) - Location of runtime stack
(
%rsp) - Location of current code control point
(
%rip, … ) - Status of recent tests ( CF, ZF, SF, OF )
- Temporary data
(
...
Condition Codes (Implicit Setting)
-
Single bit registers
- CF Carry Flag (for unsigned)
- SF Sign Flag (for signed)
- ZF Zero Flag
- If the result of an operation is 0, it is set to true. Very useful.
- You can compare things by using the Zero Flag.
- OF Overflow Flag (for signed)
- If two’s-complement (signed) overflow happens
-
Implicitly set (think of it as side effect) by arithmetic operations
Example:
addq <Src>,<Dest>↔t = a+b
CF set if carry out from most significant bit (unsigned overflow)
ZF set if t == 0
SF set if t < 0 (as signed)
OF set if two’s-complement (signed) overflow
(a>0 && b>0 && t<0) || (a<0 && b<0 && t>=0) -
Not set by
leaqinstruction
Condition Codes (Explicit Setting: cmp)
Explicit Setting by Compare Instruction
-
cmpq Src2, Src1
-
cmpq b,a like computing a-b without setting destination
- Basically compares to items of data
- Computes the second operand "minus" the first operand
-
CF set if carry out from most significant bit (used for unsigned comparisons)
-
ZF set if a == b
-
SF set if (a-b) < 0 (as signed)
- Tells us if result is negative
-
OF set if two’s-complement (signed) overflow
- (a>0 && b<0 && (a-b)<0) || (a<0 && b>0 && (a-b)>0)
- Not really useful for the compare instruction
Condition Codes (Explicit Setting: test)
Explicit Setting by Test instruction
-
testq Src2, Src1
- testq b,a like computing a&b without setting destination
- Takes the bitwise "and" of two operands
- testq b,a like computing a&b without setting destination
-
Sets condition codes based on value of Src1 & Src2
-
Useful to have one of the operands be a mask
-
ZF set when a&b == 0
-
SF set when a&b < 0
Reading Condition Codes
SetX Instructions
- Set low-order byte of destination to 0 or 1 based on combinations of condition codes
- Does not alter remaining 7 bytes
| SetX | Condition | Description |
|---|---|---|
sete |
ZF |
Equal / Zero |
setne |
~ZF |
Not Equal / Not Zero |
sets |
SF |
Negative |
setns |
~SF |
Nonnegative |
setg |
~(SF^OF)&~ZF |
Greater (Signed) |
setge |
~(SF^OF) |
Greater or Equal (Signed) |
setl |
(SF^OF) |
Less (Signed) |
setle |
(SF^OF)|ZF |
Less or Equal (Signed) |
seta |
~CF&~ZF |
Above (unsigned) |
setb |
CF |
Below (unsigned) |
x86-64 Integer Registers
/CSCE-312/Visual%20Aids/Pasted%20image%2020250925134713.png)
Reading Condition Codes (Cont.)
- SetX Instructions:
- Set single byte based on combination of condition codes
- One of addressable byte registers
- Does not alter remaining bytes
- Typically use
movzblto finish job- 32-bit instructions also set upper 32 bits to 0
int gt (long x, long y)
{
return x > y;
}
cmpq %rsi, %rdi # Compare x:y
setg %al # Set when >
movzbl %al, %eax # Zero rest of %rax
ret
movzblis converting a 1 byte register to an 8 byte register (64-bit register)
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rax | Return value |
Conditional branches
Jumping
jX Instructions
- Jump to different part of code depending on condition codes
| jX | Condition | Description |
|---|---|---|
jmp |
1 | Unconditional |
je |
ZF | Equal / Zero |
jne |
~ZF | Not Equal / Not Zero |
js |
SF | Negative |
jns |
~SF | Nonnegative |
jg |
~(SF^OF)&~ZF | Greater (Signed) |
jge |
~(SF^OF) | Greater or Equal (Signed) |
jl |
(SF^OF) | Less (Signed) |
jle |
(SF^OF)|ZF | Less or Equal (Signed) |
ja |
~CF&~ZF | Above (unsigned) |
jb |
CF | Below (unsigned) |
| Some explanation |
je- jumps if the Zero Flag is true
- Stands for "jump if equal"
- Used in comparing two registers
jne- The opposite of
jne - "Jump if not equal"
- The opposite of
js, jns- Check the sign flag
jg, jge- Greater and greater than equal to
jl, jle- Less and less than or equal to
- Above: comparing unsigned numbers
- Below: comparing unsigned numbers?
Conditional Branch Example (Old Style)
Generation
shark> gcc –Og -S –fno-if-conversion control.c
- Make sure that you turn off that optimization flag:
–fno-if-conversion - If you do not use that then you might see a
movstatement instead of thatjle(jump) - This is an optimization made by the compiler itself
C code:
long absdiff
(long x, long y)
{
long result;
if (x > y)
result = x-y;
else
result = y-x;
return result;
}
- Returns the absolute value of the difference between
xandy.
Assembly code:
absdiff:
cmpq %rsi, %rdi # x:y
jle .L4
movq %rdi, %rax
subq %rsi, %rax
ret
.L4: # x <= y
movq %rsi, %rax
subq %rdi, %rax
ret
jle .L4- Saying: if x <= y, then jump over to the .L4 stuff
- else you continue reading the next instructions, which will simulate the case when x >= y.
- This is a use of conditional jumping (an interpretation of an
ifstatement)
cmpqis based on values passed as parameters, when we do a jump we don't know whether we are supposed to do it or not...- Hurts the parallelism that we can expect from a program, so if we can avoid them it will be better, and that is what we can do by using conditional loops.
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rax | Return value |
Check pointing:
- Have a program read back in the data from a file that saved the state of the computer (checkpoint the state of the computer)
- It is very costly if we checkpoint and then restore from checkpoint
- Wasting time and energy, we want to avoid doing that, this is where branch predictors make sense
- Are able to predict which branch will be chosen with high accuracy
Expressing with Goto Code
- C allows
gotostatement- Takes you directly from one part of a C function to another part
- This is considered harmful: "Goto considered harmful"
- If you use it, your code quickly becomes unmanageable
- We replace it with structured programming
- You may find it more useful once every few years
- Usually there is a better way to do things than using
gotoin C - But in assembly language all you can use is "
goto"
- Jump to position designated by label
Example:
long absdiff
(long x, long y)
{
long result;
if (x > y)
result = x-y;
else
result = y-x;
return result;
}
We can rewrite the above function as follows:
long absdiff_j
(long x, long y)
{
long result;
int ntest = x <= y;
if (ntest) goto Else;
result = x-y;
goto Done;
Else:
result = y-x;
Done:
return result;
}
-
Note the
gotostatement in action -
Much harder to follow
-
This kind of C is closer to assembly but not yet assembly language
-
This is uglier and harder to follow but it may help us understand assembly code and where things came from in assembly.
-
Why does Else have a capital E?
elseis a keyword so we cannot use it here, we need to differentiate the branch somehow.
General Conditional Expression Translation (Using Branches)
C Code
val = Test ? Then_Expr : Else_Expr;
- Be careful when you use this operator in code you want someone else to read, it can be confusing.
val = x>y ? x-y : y-x;
Goto Version
ntest = !Test;
if (ntest) goto Else;
val = Then_Expr;
goto Done;
Else:
val = Else_Expr;
Done:
. . .
- Create separate code regions for then & else expressions
- Execute appropriate one
Using Conditional Moves
- Conditional Move Instructions
- Instruction supports:
- if (Test) Dest <- Src
- Supported in post-1995 x86 processors
- GCC tries to use them
- But, only when known to be safe
- Instruction supports:
- Why?
- Branches are very disruptive to instruction flow through pipelines
- Conditional moves do not require control transfer
C Code
val = Test
? Then_Expr
: Else_Expr;
Goto Version
result = Then_Expr;
eval = Else_Expr;
nt = !Test;
if (nt) result = eval;
return result;
Conditional Move Example
long absdiff
(long x, long y)
{
long result;
if (x > y)
result = x-y;
else
result = y-x;
return result;
}
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rax | Return value |
absdiff:
movq %rdi, %rax # x
subq %rsi, %rax # result = x-y
movq %rsi, %rdx
subq %rdi, %rdx # eval = y-x
cmpq %rsi, %rdi # x:y
cmovle %rdx, %rax # if <=, result = eval
ret
- That conditional move avoided a branch.
- Here we are computing both (x-y and y-x), and every time we do this we throw away one of them, but why does it still take less time than branching?
- We got rid of a branch, we eliminated that factor that a program didn't know where to go depending on waiting for something to be compared for example
- Here all instructions are executed simultaneously (in parallel)
- This is why we say branches can cause a lot of performance problems
Bad Cases for Conditional Move
Expensive Computations
val = Test(x) ? Hard1(x) : Hard2(x);
- Both values get computed
- We do not necessarily want them to both be computed
- Only makes sense when computations are very simple
Risky Computations
val = p ? *p : 0;
- Both values get computed
- May have undesirable effects
Computations with side effects
c
- Both values get computed
- Must be side-effect free
Loops
"Do-While" Loop Example
Three main kind of loops in C++
while- do-while
- This is actually easier for the compiler to process
for- It is the most general, the meaning of loop falls good in it.
- Because of that generality, it is the most complicated kind of loop for the compiler to reason about
C Code
long pcount_do
(unsigned long x) {
long result = 0;
do {
result += x & 0x1;
x >>= 1;
} while (x);
return result;
}
- Counting the number of 1 bits.
- We just need to keep counting bits until we run out of bits
- To translate to assembly we first need to get rid of any structured programming structure like the do-while loop.
Goto Version
long pcount_goto
(unsigned long x) {
long result = 0;
loop:
result += x & 0x1;
x >>= 1;
if(x) goto loop;
return result;
}
- Count number of 1’s in argument x (“popcount”)
- Use conditional branch to either continue looping or to exit loop
- Note:
if(x) goto loop;- This looks familiar
- Looks like I could do this with a conditional branch that jumps if the zero flag is false.
"Do-While" Loop Compilation
Goto Version
long pcount_goto
(unsigned long x) {
long result = 0;
loop:
result += x & 0x1;
x >>= 1;
if(x) goto loop;
return result;
}
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rax | result |
movl $0, %eax # result = 0
.L2: # loop:
movq %rdi, %rdx
andl $1, %edx # t = x & 0x1
addq %rdx, %rax # result += t
shrq %rdi # x >>= 1
jne .L2 # if (x) goto loop
rep; ret
%eaxis the 32-bit version of%rax- When you change
eaxthe compiler also changesrax - It sets it equal to 0 in this example.
- Why when I could just an XOR with
eaxitself?- Yes this is a very common idiom in compilers
- It is actually a little bit more efficient
- In this example is not done for understanding purposes.
- When you change
- Take
xand put it intordx edxis the 32-bit version ofrdx- AND the
rdxregister with 1 and put the result back intoedx
- AND the
shrqis using a logical shift treating it as an unsignedjne: jump if not equal to zero- Jump back up to .L2
General "Do-While" Translation
C Code
do
Body
while (Test);
Goto Verstion:
loop:
Body
if (Test)
goto loop
Body:
{
Statement1;
Statement2;
…
Statementn;
}
- Arbitrary C statements
- Usually a compund statement -> usually doing something complicated
General "While" Translation
-
A little more complicated than do-while
- With do-while we are executing the body at least once
- But with while we need to make something special for the case we don't even get to the loop body once
-
“Jump-to-middle” translation
-
Used with
-Og
While verstion
while (Test)
Body
Goto Version
goto test;
loop:
Body
test:
if (Test)
goto loop;
done:
While Loop Example #1
C Code
long pcount_while
(unsigned long x) {
long result = 0;
while (x) {
result += x & 0x1;
x >>= 1;
}
return result;
}
Jump to Middle
long pcount_goto_jtm
(unsigned long x) {
long result = 0;
goto test;
loop:
result += x & 0x1;
x >>= 1;
test:
if(x) goto loop;
return result;
}
- If x starts out at 0, we never get into this loop
- Compare to do-while version of function
- Initial goto starts loop at test
General "While" Translation #2
- Another way is to pretend that a while loop is actually a do-while loop and do something else about it
While version
while (Test)
Body
Do-While Version
if (!Test)
goto done;
do
Body
while(Test);
done:
Goto Version
if (!Test)
goto done;
loop:
Body
if (Test)
goto loop;
done:
-
“Do-while” conversion
-
Used with
–O1 -
Might be a little better because it avoids that initial jump, it makes it a "maybe" jump (avoids initial jump)
- The compiler can look at expressions and can sometimes evaluate them at compile time
While Loop Example #2
- Same example as [[#While Loop Example 1]]
Do-While Version
long pcount_goto_dw
(unsigned long x) {
long result = 0;
if (!x) goto done;
loop:
result += x & 0x1;
x >>= 1;
if(x) goto loop;
done:
return result;
}
- Compare to do-while version of function
- Initial conditional guards entrance to loop
"For" Loop Form
- Returns the same value given the same input, but the way it does it is going to be different from a while loop
General Form
for (Init; Test; Update )
Body
#define WSIZE 8*sizeof(int)
long pcount_for
(unsigned long x)
{
size_t i;
long result = 0;
for (i = 0; i < WSIZE; i++)
{
unsigned bit =
(x >> i) & 0x1;
result += bit;
}
return result;
}
Init
i = 0
- initial expression
Test
i < WSIZE
- test expression
Update
i++
- Add "1" to
i
Body
{
unsigned bit =
(x >> i) & 0x1;
result += bit;
}
- How can we make this in assembly?
- Make a convert it to a While loop and then convert it to a do-while loop.
"For" Loop -> While Loop
For version
for (Init; Test; Update )
Body
While Version
Init;
while (Test ) {
Body
Update;
}
For-While Conversion
long pcount_for_while
(unsigned long x)
{
size_t i;
long result = 0;
i = 0;
while (i < WSIZE)
{
unsigned bit =
(x >> i) & 0x1;
result += bit;
i++;
}
return result;
}
"For" Loop Do-While Conversion
C Code
long pcount_for
(unsigned long x)
{
size_t i;
long result = 0;
for (i = 0; i < WSIZE; i++)
{
unsigned bit =
(x >> i) & 0x1;
result += bit;
}
return result;
}
Goto Version
/CSCE-312/Visual%20Aids/Pasted%20image%2020250930134557.png)
- Initial test can be optimized away
Switch Statements
Switch Statement Example
C code:
long switch_eg
(long x, long y, long z)
{
long w = 1;
switch(x) {
case 1:
w = y*z;
break;
case 2:
w = y/z;
/* Fall Through */
case 3:
w += z;
break;
case 5:
case 6:
w -= z;
break;
default:
w = 2;
}
return w;
}
-
Multiple case labels
- Here: 5 & 6
-
Fall through cases
- Here: 2
-
Missing cases
- Here: 4
-
How can we compile this into assembly language?
- How would you implement
break- goto end
- if no break just don't goto
- This is a linear search for the case.
- What do we know about linear searches?
- They are bad (we can do better)
- Do binary search!
- What about hash tables?
- we have something similar -> Jump tables
- What do we know about linear searches?
- How would you implement
Jump Table Structure
/CSCE-312/Visual%20Aids/Pasted%20image%2020250930135052.png)
-
Instead of searching one by one, we just get the value, pull up the pointer that corresponds to that value and jump to that pointer.
-
Hashtables are O(1)
- A way we can search for the right case in O(1) time -> Jump tables
-
Switch statements are a natural form of control
- Can be solved with a jump table much faster than with a sequence of
ifs - Switch lets us use this O(1) implementation of the control flow.
- Can be solved with a jump table much faster than with a sequence of
-
In assembly the compiler fills in the code for each case
- Code block 0, code block 1, code block 2, etc...
Switch Statement Exmaple
long switch_eg(long x, long y, long z)
{
long w = 1;
switch(x) {
. . .
}
return w;
}
/CSCE-312/Visual%20Aids/Pasted%20image%2020250930135708.png)
Setup:
switch_eg:
movq %rdx, %rcx
cmpq $6, %rdi # x:6
ja .L8 # Use default
jmp *.L4(,%rdi,8) # goto *JTab[x]
- What
jarange of values takes default? - Last line is an Indirect Jump
- Branch predictors hate this jump, it is very hard to predict, so it does have a latency factor, but it is much better than doing asymptotically sequence of
ifstatements or even binary search!
- Branch predictors hate this jump, it is very hard to predict, so it does have a latency factor, but it is much better than doing asymptotically sequence of
- The
.prefix directs the assembler to do something.- Note when x == 5 and x == 6, we are supposed to do the same thing, so that is why we can see two
.L7there
- Note when x == 5 and x == 6, we are supposed to do the same thing, so that is why we can see two
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rdx | Argument z |
| %rax | Return value |
- Note that
wnot initialized here
Assembly Setup Explanation
- Table Structure
- Each target requires 8 bytes
- Base address at
.L4
- Jumping
-
Direct:
jmp .L8 -
jump target is denoted by label
.L8 -
Indirect:
jmp *.L4(,%rdi,8)- Using indirection to do a jump
- This is saying get a pointer out of memory and then jump to that pointer
-
Start of jump table:
.L4 -
Must scale by factor of 8 (addresses are 8 bytes)
-
Fetch target from effective Address
.L4 + x*8- Only for 0 <=
x<= 6
- Only for 0 <=
-
Jump Table
/CSCE-312/Visual%20Aids/Pasted%20image%2020251002125055.png)
- "Make an array of the addresses of these different blocks of code"
- Note: The
.L8label is the default case in this example.
Code Blocks (x == 1)
C code:
switch(x) {
case 1: // .L3
w = y*z;
break;
. . .
}
Assembly code:
.L3:
movq %rsi, %rax # y
imulq %rdx, %rax # y*z
ret
- Defining label
.L3(code block)- Move
yintorax - Multiply
ytimesz - Return
- Move
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rdx | Argument z |
| %rax | Return value |
Handling Fall-Through
/CSCE-312/Visual%20Aids/Pasted%20image%2020251002125319.png)
Code Blocks (x == 2, x == 3)
C code:
long w = 1;
. . .
switch(x) {
. . .
case 2:
w = y/z;
/* Fall Through */
case 3:
w += z;
break;
. . .
}
Assembly code:
.L5: # Case 2
movq %rsi, %rax
cqto
idivq %rcx # y/z
jmp .L6 # goto merge
.L9: # Case 3
movl $1, %eax # w = 1
.L6: # merge:
addq %rcx, %rax # w += z
ret
- Case 2 is handled by
.L5- What does
cqtodoes?- Something about setting up a division
- What does
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rdx | Argument z |
| %rax | Return value |
Code Blocks (x == 5, x == 6, default)
C code:
switch(x) {
. . .
case 5: // .L7
case 6: // .L7
w -= z;
break;
default: // .L8
w = 2;
}
Assembly code:
.L7: # Case 5,6
movl $1, %eax # w = 1
subq %rdx, %rax # w -= z
ret
.L8: # Default:
movl $2, %eax # 2
ret
| Register | Use(s) |
|---|---|
| %rdi | Argument x |
| %rsi | Argument y |
| %rdx | Argument z |
| %rax | Return value |
Remember:
- Jumps are still very costly, and hard to predict.
- Indirect branches are much harder to predict
- Predicting between many different cases
- If you predict wrong you need to go back, so it will be more costly
- In some cases actually doing a linear or binary search would be a better solution than trying prediction.
Summarizing
-
C Control
- if-then-else
- do-while
- It is a turing complete structure, anything after it is just niceness and making languages more fancy.
- while, for
- switch
-
Assembler Control
- Conditional jump
- Conditional move
- Indirect jump (via jump tables)
- Compiler generates code sequence to implement more complex control
-
Standard Techniques
- Loops converted to do-while or jump-to-middle form
- Large switch statements use jump tables
- Sparse switch statements may use decision trees (if-elseif-elseif-else)