Cache Memories

Class: CSCE-312


Notes:

Today

Cache memory organization and operation

Example Memory Hierarchy

Pasted image 20251111135154.png|600

Cache Memories

Pasted image 20251114043018.png|550

General Cache Organization (S, E, B)

...
![[Cache Memories.pdf]]

Cache Size: C = S * E * B data bytes

Structure of the Cache

Sets (S = 2ˢ)

Lines per Set (E = 2ᵉ)

Block Size (B = 2ᵇ bytes per block)

A Cache Line Contains:

A single cache line has the following fields:
1. Valid bit (v)

2. Tag

3. Data block (B bytes)

Cache Read

Notes:

Address of word:

MISS
HIT

Address Breakdown

Every memory address is divided into three parts:

  1. Tag (t bits)
    • Identifies which memory block is stored in a cache line.
    • Compared against all lines in the selected set.
  2. Set index (s bits)
    • Selects one of the S = 2ˢ sets in the cache.
    • Determines exactly which bucket of lines to look inside.
  3. Block offset (b bits)
    • Locates the exact byte within the B = 2ᵇ-byte block stored in the cache line.
    • Low-order bits of the address.

Address of word:

|   tag   |  set index  |  block offset |
|  t bits |    s bits   |     b bits    |

Cache Lookup Process (Read Operation)

  1. Step 1 — Locate the Set
    • Use the set index bits to pick one set:
    • set = address[set index bits]
      • Only this set could possibly contain the desired block.
  2. Step 2 — Search the Lines Within the Set
    • Each set contains E lines (E = 2ᵉ ways).
    • For every line in the set, check:
      1. Valid bit = 1?
      2. Tag matches the tag bits of the address?
    • If both are true → we found the block.
  3. Step 3 — Determine Hit or Miss
    • Hit
      • At least one line in the set has matching tag and valid bit = 1.
      • The cache returns the data immediately.
    • Miss
      • No matching tag found in any line.
      • Cache must fetch the block from the next memory hierarchy level (L2, L3, DRAM).
  4. Step 4 — Locate the Exact Byte Using Block Offset
    • If the tag matched (hit):
      • Use the block offset bits to index into the B-byte data block.
      • Example:
        • If block size B = 32 bytes, offset selects byte 0–31.

Check slides for "Example: Direct Mapped Cache (E = 1)"

Check slides for "Direct-Mapped Cache Simulation"

Check slides for "E-way Set Associative Cache (Here: E = 2)"

Check slides for "2-Way Set Associative Cache Simulation"

What about writes?

Intel Core i7 Cache Hierarchy

Pasted image 20251114044631.png|425

Cache Performance Metrics

Notes:

Let's think about those numbers

Notes:

Writing Cache Friendly Code

Performance impact of caches

The Memory Mountain

Memory Mountain Test Function

mountain/mountain.c

long data[MAXELEMS]; /* Global array to traverse */

/* test - Iterate over first "elems" elements of
* array “data” with stride of "stride", using 
* using 4x4 loop unrolling. 
*/ 
int test(int elems, int stride) {
	long i, sx2=stride*2, sx3=stride*3, sx4=stride*4;
	long acc0 = 0, acc1 = 0, acc2 = 0, acc3 = 0;
	long length = elems, limit = length - sx4;
	
	/* Combine 4 elements at a time */
	for (i = 0; i < limit; i += sx4) {
		acc0 = acc0 + data[i];
		acc1 = acc1 + data[i+stride];
		acc2 = acc2 + data[i+sx2];
		acc3 = acc3 + data[i+sx3];
	}
	
	/* Finish any remaining elements */
	for (; i < length; i++) {
		acc0 = acc0 + data[i];
	}
	return ((acc0 + acc1) + (acc2 + acc3));
}

Pasted image 20251114050215.png|500

Rearranging loops to improve spatial locality

Matrix Multiplication Example

matmult/mm.c

/* ijk */
for (i=0; i<n; i++) {
	for (j=0; j<n; j++) {
		sum = 0.0;            // variable `sum` held in register
		for (k=0; k<n; k++) 
			sum += a[i][k] * b[k][j];
		c[i][j] = sum;
	}
}

Notes:

Miss Rate Analysis for Matrix Multiply

Pasted image 20251114050846.png|400

Layout of C Arrays in Memory (review)

Check slides for "Matrix Multiplication (...)"

Core i7 Matrix Multiply Performance

Pasted image 20251114051608.png|500

Using blocking to improve temporal locality

Example: Matrix Multiplication

c = (double *) calloc(sizeof(double), n*n);

/* Multiply n x n matrices a and b */
void mmm(double *a, double *b, double *c, int n) {
	int i, j, k;
	for (i = 0; i < n; i++)
		for (j = 0; j < n; j++)
			for (k = 0; k < n; k++)
				c[i*n + j] += a[i*n + k] * b[k*n + j];
}

Cache Miss Analysis

Remember: Example about prime numbers.

Blocking Summary

...

Cache Summary