Machine-Level Programming IV - Data
Class: CSCE-312
Notes:
Today
- Arrays
- One-dimensional
- Multi-dimensional (nested)
- Multi-level
- Structures
- Allocation
- Access
- Alignment
- Floating Point
Array
Array Allocation
- Basic Principle
T A[L];- Array of data type
Tand lengthL - Contiguously allocated region of
L * sizeof(T)bytes in memory
- Array of data type
/CSCE-312/Visual%20Aids/Pasted%20image%2020251007135204.png)
- Remember that these multi-byte quantities are little endian (least significant bit goes first)
- Addresses are x + 0 through x + 11. (not including address x + 12)
- Remember that a pointer has 8 bytes in x86-64
- An array of any kind of pointer will be 8 byte of elements
Array Access
-
Basic Principle
T A[L];- Array of data type
Tand lengthL - Identifier
Acan be used as a pointer to array element 0: TypeT*
- Array of data type
/CSCE-312/Visual%20Aids/Pasted%20image%2020251007135546.png)
Reference Type Value
val[4] int 3
val int * x
val+1 int * x + 4
&val[2] int * x + 8
val[5] int ??
*(val+1) int 5
val + i int * x + 4i
- Note:
val+1is the same as&val[1]- Adds 1 to a pointer but not to the value
- Remember that '
*' at the start and not in an assigning context is the dereference operator. val[5]is undefined behavior, which means anything would happen but we would like a runtime system to catch that so that we can know when this happens- There is probably other memory allocated off to the side of the end of the array, so we could mess with something else and could screw up all the control flow of the program
- This is also known as a segmentation fault.
- Other languages like java or python can catch those errors at runtime because they do array bound checking at cost of some performance.
Array Example
/CSCE-312/Visual%20Aids/Pasted%20image%2020251007135801.png)
- Declaration “
zip_dig cmu” equivalent to “int cmu[5]” - Example arrays were allocated in successive 20 byte blocks
- Not guaranteed to happen in general
Notes:
typdedefis pretending that what follows is the declaration of a variable but really treat it as a typezip_digis a type (an array of 5int's)
- For
cmu, what would be the byte on address 16?- It is 1, this is because it is little endian
- It is guaranteed that in an array of consecutive elements, they will be arrayed in consecutive order within memory, though the array itself could be at any location in memory.
- It is better if data is aligned on some power of 2 boundaries
- Done for optimization
Array Accessing Example
/CSCE-312/Visual%20Aids/Pasted%20image%2020251007140014.png)
- A function that accepts a
zip_digand a digit position and returns the digit ofzip_digin that position.
IA32
# %rdi = z
# %rsi = digit
movl (%rdi,%rsi,4), %eax # z[digit]
- Register
%rdicontains starting address of array - Register
%rsicontains array index - Desired digit at
%rdi + 4*%rsi- Note the similarity with the form: x + 4i
- Use memory reference
(%rdi,%rsi,4) - Integer return value is in
%eax
Array Loop Example
C code:
void zincr(zip_dig z) {
size_t i;
for (i = 0; i < ZLEN; i++)
z[i]++;
}
- Why would you like to do this? no reason, it is just an example.
- Increment the
iindex of the arrayzon each pass.
Assembly code:
# %rdi = z
movl $0, %eax # i = 0
jmp .L3 # goto middle
.L4: # loop:
addl $1, (%rdi,%rax,4) # z[i]++
addq $1, %rax # i++
.L3: # middle
cmpq $4, %rax # i:4 (because array size is 5)
jbe .L4 # if <=, goto loop
rep; ret
- Why are we using
%eaxas a counter here?- Since we do not have a return value in this function, we can use it freely.
- Note
(%rdi,%rax,4)is the address of that elementaddl $1, (%rdi,%rax,4)takes 1, adds it to this array element, and then puts the value back in the array element.
cmpq $4, %raxchecks if%raxhas reached 4. Why is it comparing with 4 instead of 5?- i < 5 -> i <= 4
- ZLEN is 5 right? yeah but we do not know why the compiler changes the numbers, maybe because it is worried a value may be outside of the range. Sometimes compiler uses
<=rather than<, that is something that just happens, there is no negative effect on this.
- Note we are using
jbe(unsigned below comparison)
Multidimensional (Nested) Arrays
-
Declaration:
T A[R][C];- 2D array of data type T
- R rows, C columns
- Type T element requires K bytes
-
Array Size
- R * C * K bytes
-
Arrangement
-
Row-Major Ordering
- Reads row by row in order (from left to right)
A[0][0] ... A[0][C-1] . . . . . . A[R-1][0] ... A[R-1][C-1]- In the computer everything is linear, one address after the other, but here we can draw it two dimensionally, because it gives us a better idea of what we are looking at, it looks like a matrix.
-
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016125103.png)
- Note row-major ordering, why is this important?
- Because some languages use column-major ordering, why? we do not know, it is just using a different way of ordering, no major effect.
- Accessing elements in column-major order in a row-major system is much more slower so it is important to know the order of your system/language.
- The cache of a processor makes accessing to memory faster
- An arithmetic operations takes a couple cycles
- A
movcould take hundreds of cycles - Cache remembers recently accessed pieces of memory, so that we can get it quickly instead of within hundreds of cycles
Nested Array Example
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016125257.png)
- “
zip_dig pgh[4]” equivalent to “int pgh[4][5]"- Variable
pgh: array of 4 elements, allocated contiguously - Each element is an array of 5
int’s, allocated contiguously - Summary: 4 rows, 5 columns
- Variable
- “Row-Major” ordering of all elements in memory
Notes:
- In C and C++ a two dimensional array is an array of arrays
- They are kind of limited
- This is a primitive array type, you always need to specify sizes (number of rows and columns)
- The dimensions are hooked into the array.
- In this example, each row has 5 elements.
Nested Array Row Access
- Row Vectors
A[i]is array of C elements- Each element of type T requires K bytes
- Starting address
A + i * (C * K)- K is the size of the object
- Int -> 4
- Double/pointer -> 8
- Struct -> ?? (need to compute size)
i(index) will tell you which inner array to access- In assembly we can represent this formula with two
leaqinstructions.
- K is the size of the object
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016125512.png)
Nested Array Row Access Code
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016133401.png)
C code:
int *get_pgh_zip(int index)
{
return pgh[index];
}
Assembly code for A + i * (C * K):
# %rdi = index
leaq (%rdi,%rdi,4),%rax # 5 * index
leaq pgh(,%rax,4),%rax # pgh + (20 * index)
- Row Vector
pgh[index]is array of 5int’s- Starting address
pgh+20*index
- Machine Code
- Computes and returns address
- Compute as
pgh + 4*(index+4*index)
Notes:
- Multiplying
rdi* 5 because each row is 5 elements so we multiply the index times 5, this tells us how far into the array we should go to find that row.- Getting the right offset into the array in terms of
int's, then in terms of bytes
- Getting the right offset into the array in terms of
- We have computed a pointer to the vector in this two dimensional array.
Nested Array Element Access
- Array Elements
A[i][j]is element of type T, which requires K bytes- Address
A + i * (C * K) + j * K- =
A + (i * C + j)* K
- =
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016134002.png)
Nested Array Element Access Code
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016133401.png)
C code:
int get_pgh_digit
(int index, int dig)
{
return pgh[index][dig];
}
Assembly code for A + i * (C * K) + j * K:
leaq (%rdi,%rdi,4), %rax # 5*index
addl %rax, %rsi # 5*index+dig
movl pgh(,%rsi,4), %eax # M[pgh + 4*(5*index+dig)]
- Array Elements
pgh[index][dig]isint- Address:
pgh + 20*index + 4*dig- Tells us the byte in memory where this value lives
- =
pgh + 4*(5*index + dig)
Notes:
- A two dimensional array is exactly a matrix
- Correct:
Multi-Level Array Example
- Syntactically may look the same, but the way of seeing it in memory is more versatile
zip_dig cmu = { 1, 5, 2, 1, 3 };
zip_dig mit = { 0, 2, 1, 3, 9 };
zip_dig ucb = { 9, 4, 7, 2, 0 };
#define UCOUNT 3
int *univ[UCOUNT] = {mit, cmu, ucb};
- Variable
univdenotes array of 3 elements - Each element is a pointer
- 8 bytes
- Each pointer points to array of
int’s- Each one points to a different one-dimensional array
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016134434.png)
Element Access in Multi-Level Array
C code:
int get_univ_digit
(size_t index, size_t digit)
{
return univ[index][digit];
}
Assembly code:
salq $2, %rsi # 4*digit
addq univ(,%rdi,8), %rsi # p = univ[index] + 4*digit
movl (%rsi), %eax # return *p
ret
- We are taking the digit index, multiply it by 4 to have it in terms of bytes
- Add it to the pointer that we are getting out of memory, addressing it in chunks of 8 because we have an array of elements of 8 bytes.
- Return the address
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016134623.png)
- Computation
- Element access
Mem[Mem[univ+8*index]+4*digit] - Must do two memory reads
- First get pointer to row array
- Then access element within array (get the digit)
- Element access
NXN Matrix Code
- Fixed dimensions
- Know value of N at compile time
- With primitive types we need to know how many rows and columns, otherwise we cannot do the arithmetic
- Variable dimensions, explicit indexing
- Traditional way to implement dynamic arrays
- Go to the indexes manually using the following formula:
((i)*(n)+(j))
- Variable dimensions, implicit indexing
- Now supported by gcc
/CSCE-312/Visual%20Aids/Pasted%20image%2020251016135046.png)
- We can make a two-dimensional array by using one-dimensional arrays using pointers
- Note: we are simulating a two-dimensional array using a one-dimensional array
- gcc now sometimes supports passing two-dimensional arrays as arguments.
16 X 16 Matrix Access
- Array Elements
- Address
A + i * (C * K) + j * K - C = 16, K = 4
- Address
C code:
/* Get element a[i][j] */
int fix_ele(fix_matrix a, size_t i, size_t j) {
return a[i][j];
}
Assembly code:
# a in %rdi, i in %rsi, j in %rdx
salq $6, %rsi # 64*i -> (C * K)
addq %rsi, %rdi # a + 64*i -> (A + i * (C * K))
movl (%rdi,%rdx,4), %eax # M[a + 64*i + 4*j] -> (A + i * (C * K) + j * K)
ret
n X n Matrix Access
- Array Elements
- Address
A + i * (C * K) + j * K - C = n, K = 4
- Must perform integer multiplication
- Address
C code:
/* Get element a[i][j] */
int var_ele(size_t n, int a[n][n], size_t i, size_t j) {
return a[i][j];
}
Assembly code:
# n in %rdi, a in %rsi, i in %rdx, j in %rcx
imulq %rdx, %rdi # n*i
leaq (%rsi,%rdi,4), %rax # a + 4*n*i
movl (%rax,%rcx,4), %eax # a + 4*n*i + 4*j
ret
Structures
Structure Representation
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021125012.png)
- Structure represented as block of memory
- Big enough to hold all of the fields
- Fields ordered according to declaration
- Even if another ordering (could be better) could yield a more compact representation
- Compiler determines overall size + positions of fields
- Machine-level program has no understanding of the structures in the source code
Note:
- In C++,
structis just a class where everything is public - In C, this is creating a struct type with the type
rec- Still need to specify
struct recto use the type - C++ still recognizes this syntax but does not require it
- Still need to specify
- Remember
size_tis the same asunsigened long ais 4 bytes- Ends at position 15
istarts at position 16 and is 8 bytes because it is anunsigned longnextis a pointer so it takes 8 bytes as well.- Some fields are more frequency accessed than others so it would be a more optimized approach to order them by groups of frequent vs unfrequent.
Generating Pointer to Structure Member
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021125914.png)
- Generating Pointer to Array Element
- Offset of each structure member determined at compile time
- Compute as
r + 4*idx
C code:
int *get_ap
(struct rec *r, size_t idx)
{
return &r->a[idx];
}
Assembly code:
# r in %rdi, idx in %rsi
leaq (%rdi,%rsi,4), %rax
ret
- We do not need to do any offset for the first element
rdiis the pointer to the structrsiis the indexrdi + rsi*4and put value intorax
Following Linked List
C code:
struct rec {
int a[4];
int i;
struct rec *next;
};
void set_val
(struct rec *r, int val)
{
while (r) {
int i = r->i;
r->a[i] = val;
r = r->next;
}
}
- Chases down the linked list, and takes i as index into the a array and sets that element as some value that is passed as a parameter
- Note we are making the current pointer, the next pointer to continue iterating over the next elements of the linked list.
Assembly code:
.L11: # loop:
movslq 16(%rdi), %rax # i = M[r+16]
movl %esi, (%rdi,%rax,4) # M[r+4*i] = val
movq 24(%rdi), %rdi # r = M[r+24]
testq %rdi, %rdi # Test r
jne .L11 # if !=0 goto loop
movslq 16(%rdi), %rax: we are takingint4 bytes and sign extending it 8 bytes.movlis saying: takevaland put it intoa[i]- Note
r->nextis the same thing asr + 24, since this is where thenextpointer is found in the struct testq: If the pointer is notnullthen we loop, otherwise it is a fall-through.
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021130224.png)
rdi= pointer to the beginning of a list
But this assembly code is not quite right!
- In a way that could cause a failure/bug
- Why is this c while loop, not the same as this assembly code?
- It does not test at the beginning!
- What if
ris initiallynull? then this would be a segmentation fault.
- Study this example! - it shows everything we need to know about accessing structs and arrays.
Structures & Alignment
-
Unaligned Data
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021131154.png)
- Not knowing anything about alignment
- 1 byte, 4 bytes, 4 bytes, and 8 bytes
- Not knowing anything about alignment
-
Aligned Data
- Primitive data type requires K bytes
- Address must be multiple of K
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021131226.png)
- Note we needed to path 3 bytes in order for the array to have addresses in multiples of 4 (for
intarray) - The
doubleshould go on an address that is a multiple of 8. - Ends up costing us 24 bytes because of alignment rules and those path bytes.
- Nothing goes on those bytes, they basically go to waste, but they do not go to waste because they allow your program to have better performance
- Example: accessing an
intthat is not in a multiple of 4 address causes an exception
Alignment Principles
- Aligned Data
- Primitive data type requires K bytes
- Address must be multiple of K
- Required on some machines; advised on x86-64
- Motivation for Aligning Data
- Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent)
- Inefficient to load or store datum that spans quad word boundaries
- Virtual memory trickier when datum spans 2 pages
- When a piece of data spans two virtual pages
- With proper alignment this never happens
- Memory accessed by (aligned) chunks of 4 or 8 bytes (system dependent)
- Compiler
- Inserts gaps in structure to ensure correct alignment of fields
Specific Cases of Alignment (x86-64)
1 byte: char, …- no restrictions on address (always aligned)
2 bytes: short, …- lowest 1 bit of address must be 02
- In hexadecimal, the last digit must be even (0, 2, 4, 6, 8, A, C, E).
4 bytes: int, float, …- lowest 2 bits of address must be 002
- In hexadecimal, the last digit must be 0, 4, 8, or C.
8 bytes: double, long, char *, …- lowest 3 bits of address must be 0002
- In hexadecimal, the last digit must be 0 or 8.
16 bytes: long double(GCC on Linux)- lowest 4 bits of address must be 00002
- In hexadecimal, the last digit must be 0.
Example question:
- Question type: "Here is an hexadecimal address ..., is this a properly aligned address according to its type?"
- Example question: Consider the hexadecimal address
0x12345678and a requirement for 4-byte (int) alignment.- Alignment Requirement: 4-byte alignment means the address must be a multiple of 4.
- Hexadecimal Address: 0x12345678
- Modulo Check:
- The last hexadecimal digit is 8.
- Since 8 is divisible by 4 (8 mod 4 = 0), the address 0x12345678 is properly aligned for a 4-byte type.
Satisfying Alignment with Structures
"We have to satisfy alignment within structures and between structures"
- Within structure:
- Must satisfy each element’s alignment requirement
- Overall structure placement
- Each structure has alignment requirement K
- K = Largest alignment requirement of any element within the structure
- Initial address & structure length must be multiples of K
- Each structure has alignment requirement K
- Example:
- K = 8, due to
doubleelement (largest element in struct)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021131226.png)
- K = 8, due to
Meeting Overall Alignment Requirement
-
For largest alignment requirement K
-
Overall structure must be multiple of K
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021132557.png)
- Now we have 7 path bytes that will ensure the next struct address is aligned correctly
Arrays of Structures
- Overall structure length multiple of K
- Satisfy alignment requirement for every element
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021132714.png)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021132724.png)
Accessing Array Elements
- Compute array offset
12*idxsizeof(S3), including alignment spacers
- Element
jis at offset 8 within structure - Assembler gives offset
a+8- Resolved during linking
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021132820.png)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021132831.png)
rdiis the index`leaqinstruction is just multiplyingrdiby 3movzwl: move zero extending word to long (we multiply it by 4.)- Move it into the long that we return.
- You can just say
movwand that will still work- Returning
eaxso it does not matter
- Returning
Saving Space
-
Put large data types first
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021133351.png)
-
Effect (K=4)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021133406.png)
- We went from wasting 6 bytes to wasting 2 bytes!
- Why would you do the left option where you can do it more efficiently just like the right one?
- Maybe readability
- Maybe the most frequent access field, but still probably not as important
- Other languages like java may move fields and objects around to make it more efficient but C is nice in the sense that it give you control over this.
Floating Point
Floating point refers to how computers represent and process real numbers (numbers with decimals) — like 3.14159 or -0.0001 — rather than integers.
Because floating-point math is more complex (it involves exponents, rounding, precision, etc.), CPUs historically used special hardware and instructions to handle it efficiently.
Background
- History
- x87 FP (Legacy Floating Point Unit)
- The x87 refers to the original Floating Point Unit (FPU) used in early Intel architectures (8087, 80387, etc.).
- It used a stack-based model for computation (
ST(0),ST(1), ...). - It was cumbersome and awkward for compilers and assembly programmers to use — hence “very ugly.”
- Still supported for backward compatibility, but not used in modern code.
- SSE FP (Streaming SIMD Extensions)
- Introduced around Pentium III.
- Uses registers XMM0–XMM15 (128-bit).
- Instead of a stack, it uses flat registers, which are much easier to program with.
- It supports SIMD: Single Instruction, Multiple Data — doing the same operation on multiple numbers at once.
- It was a game-changer for multimedia, graphics, and scientific applications.
- “Special case use of vector instructions” — because SSE was originally designed for graphics/multimedia (vector-like data).
- AVX FP (Advanced Vector Extensions)
- Introduced after SSE — the newest and most powerful FP system in CPUs.
- Expands registers to 256 bits (YMM0–YMM15), and later 512 bits (ZMM) with AVX-512.
- Basically an evolution of SSE, using the same idea but wider registers and better instruction formats.
- Commonly used today for scientific computing, ML, data analysis, video encoding, etc.
- x87 FP (Legacy Floating Point Unit)
Fun facts:
- A GPU is an accelerator
- GPUs have many small cores optimized for parallel floating-point operations — they are specialized “floating point accelerators.”
- We used to make FPUs (Floating point accelerators)
- Before CPUs had integrated FPUs, floating-point math was handled by a separate chip (e.g., Intel 8087 coprocessor).
- There was this need for vector instructions
- Instructions that can support doing multiple operations parallel in adjacent address, this is faster for things like multimedia and graphics.
- When you want to process large arrays (like pixels, audio samples, or matrices), doing one number at a time is too slow.
- Someone had the idea to make this but in order to be able to represent floating point instructions
- AVX stands for "vector instructions" -> an array of floats
- The term Advanced Vector Extensions literally means “we extended the CPU to handle vectors of floats.”
- Take a vector of 8 floats, another vector of 8 floats and perform them as a single instruction.
Programming with SSE3
XMM Registers
-
Specific registers for working with floating points and parallel executions
-
16 total, each 16 bytes (
xmm0, ...,xmm15) -
16 single-byte integers
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021133954.png)
-
8 16-bit integers
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134002.png)
-
4 32-bit integers
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134016.png)
-
4 single-precision floats (
float)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134023.png)
-
2 double-precision floats (
double)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134029.png)
-
1 single-precision float
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134037.png)
-
1 double-precision float
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134042.png)
Scalar & SIMD Operations
-
Scalar Operations: Single Precision
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134803.png)
- The first
sinaddssmeans "scalar" - Adds one single-precision float in
%xmm0to another single-precision float in%xmm1and stores in%xmm1 - Note the
addssinstruction, it stands for "scalar"
- The first
-
SIMD Operations: Single Precision
- SIMD = Singular Instruction - Multiple Data
- Can operate with multiple items of data with one single instruction
- For example you can have a single instruction that adds two vectors all at once
- This is fast, avoid all that fetching of instructions
- Really good thing for performance
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021134951.png)
- Really good thing for performance
- The next level is MIMD = Multiple Instructions - Multiple Data, also called multi-threading
addpsmeans "add parallel short"- This could occur sequentially or in parallel, the latter being faster.
-
Scalar Operations: Double Precision
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021135045.png)
- Take the first 8 bytes, consider them as a double.
Note: the actual representation of floating point numbers, as we saw with int's, is much more complex, way may look at it later.
FP Basics
- Arguments passed in
%xmm0,%xmm1, ... - Result returned in
%xmm0. - All XMM registers caller-saved
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021135417.png)
%xmm0is like%raxbut for floats.- For
doubleis exactly the same but we have to specifydfor "double precision".
FP Memory Referencing
- Integer (and pointer) arguments passed in regular registers
- FP values passed in XMM registers
- Different mov instructions to move between XMM registers, and between memory and XMM registers
/CSCE-312/Visual%20Aids/Pasted%20image%2020251021135720.png)
movapdis movingxmm0intoxmm1, this copy will eventually bex + vmovsdis moving "scalar" with "double" precision and copy whatrdiis pointing to, toxmm0addsdtakesxmm0and adds it toxmm1, stores inxmm1.movsdis moving the result of the addition to the address pointed byrdi- Basically much the same as it is done with integer data.
- Remember you should have structures aligned.
Other Aspects of FP Code
- Lots of instructions
- Different operations, different formats, ...
- There are many other very specific floating point instructions
- Floating-point comparisons
- Instructions
ucomissanducomisd - Set condition codes CF, ZF, and PF
- Instructions
- Using constant values
- Set XMM0 register to 0 with instruction
xorpd %xmm0, %xmm0 - This is the only constant you can get easily, other constants have to be loaded from memory.
- Others loaded from memory
- Set XMM0 register to 0 with instruction
- Floating-point representation
- IEEE has set up standards for representing floating-point numbers in computing
- Consider the mantiza (base), exponent, etc.
Summary
- Arrays
- Elements packed into contiguous region of memory
- Use index arithmetic to locate individual elements
- Structures
- Elements packed into single region of memory
- Access using offsets determined by compiler
- Possible require internal and external padding to ensure alignment
- Combinations
- Can nest structure and array code arbitrarily
- Can have an array of structs, and those structs can contain arrays, and so on.
- Can nest structure and array code arbitrarily
- Floating Point
- Data held and operated on in XMM registers
- We use special instructions for floating point data
...
There are some exercises on the last few slides of Machine-Level Programming IV - Data