Virtual Memory - Concepts

#memory

Class: CSCE-312

Notes:

Today

Address spaces
VM as a tool for caching
VM as a tool for memory management
VM as a tool for memory protection
Address translation

Address spaces

A System Using Physical Addressing

Pasted image 20251118135140.png|400

Used in “simple” systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames
Think of "prefetchers"
- Hardware prefetchers -> Actually been used in real microprocessors!
- Interesting to see how prefetchers are design to anticipate the future memory behavior of a program

A System Using Virtual Addressing

Pasted image 20251118135628.png|500

Used in all modern servers, laptops, and smart phones
One of the great ideas in computer science
- Makes programs easier to reason about and takes care of problems on memory allocation like fragmentation in memory
Go to a private address space that only your program has access to
- You do not have to worry about what else is going on on the system, you just worry about what your program does.
- In the old days all program had access to all memory, and a program could invade another program
We can also use it to do paging
- Page memory to disk, use memory I have, and then once memory is free we pul it from DRAM, etc...
There are other advantages of virtual memory.

Address Spaces

Linear address space: Ordered set of contiguous non-negative integer addresses:
Virtual address space: Set of N = 2^n virtual addresses
- Set of integers that can be virtual addresses
Physical address space: Set of M = 2^m physical addresses
- Another set of integers that can be physical addresses where data is actually kept in the DRAM

Notes:

Provide the illusion that we have a lot of memory that in reality (hardware) we may not.

Why Virtual Memory (VM)?

Uses main memory efficiently
- Use DRAM as a cache for parts of a virtual address space
Simplifies memory management
- Each process gets the same uniform linear address space
Isolates address spaces
- One process can’t interfere with another’s memory
- User program cannot access privileged kernel information and code

Notes:

Virtual Memory and Virtual Machines are two totally different topics:
Related but different
DRAM caches the parts of the Virtual Memory that are active
The other parts are kept in either the disk or in imagination.
- You can have all the blocks of the virtual memory point to the same block of 0s
- Its okay if we redundantly represent this block as a 0 since all of them will be 0s
- When you write something to one of these blocks then is when you allocate it in a different place.
If something translates imaginary addresses to real addresses, it is different from a different process and they never intersect because these translations are exclusive.
- If we didn't have this, that would be terrible for security!

VM as a tool for caching

Conceptually, virtual memory is an array of N contiguous bytes stored on disk.
The contents of the array on disk are cached in physical memory (DRAM cache)
- These cache blocks are called pages (size is P = 2^p bytes)

Pasted image 20251120131311.png|500

Think of memory as a much larger array of bits on the disk.
The actual memory on your system motherboard is cached, we call these pages
- The same concept of cache block, except for memory or virtual memory
- Some of them are cached in physical memory (some living in memory and some are not)
Usually p = 12 so page size = 4 KB

DRAM Cache Organization

DRAM cache organization driven by the enormous miss penalty
- DRAM is about 10x slower than SRAM
- Disk is about 10,000x slower than DRAM
Consequences
- Large page (block) size: typically 4 KB, sometimes 4 MB
- Fully associative
  - Any VP can be placed in any PP
  - Requires a “large” mapping function – different from cache memories
- Highly sophisticated, expensive replacement algorithms
  - Too complicated and open-ended to be implemented in hardware
    - Replacement is done so infrequently that it is done in software.
    - Code for replacement in the Linux kernel is written in C or C++
- Write-back rather than write-through
  - We leave update in DRAM, we do not write it into the disk until that page is evicted

Enabling Data Structure: Page Table

A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.
- Per-process kernel data structure in DRAM

Pasted image 20251120132006.png|550

How do we do the mapping from virtual to physical?
- We have something sophisticated called page table.
Think of it as an array of objects that helps us map virtual pages to physical pages
We will have one page table entry for every page in virtual memory
Indexed by a virtual page number
Tells us whether the page is in memory or in the disk, and if the page has been modified.
Page table entry gives us some other useful information.
Notice some pages aren't map anywhere so they go to the disk.

Page Hit

Page hit: reference to VM word that is in physical memory (DRAM cache hit)

Page Fault

Page fault: reference to VM word that is not in physical memory (DRAM cache miss)
Indicates "this is a pointer to the disk, not the memory"
Is a kind of event that we call an "Exception"
- Interrupts your program (makes it slower), causes your OS to find out what happened
- It needs to go find a page to accomodate for the fault, point to the new page on the page table and restart the program to this point.

Handling Page Fault

Page miss causes page fault (an exception)
Page fault handler selects a victim to be evicted (here VP 4)
Offending instruction is restarted: page hit!
A segmentation fault is exactly the same thing as a page fault where the memory that you are trying to access is inaccessible to the virtual memory system. (needs to risk from disk instead of from memory)

Pasted image 20251120133335.png|700

**Key point****: Waiting until the miss to copy the page to DRAM is known as demand paging
- Makes it so that we never have to actually access the disk unless it is really necessary
- If we allocated memory through memory allocation and started using it, we will never have to write it to the disk. This is very efficient.

Allocating Pages

Allocating a new page (VP 5) of virtual memory.

Pasted image 20251120133709.png|500

Locality to the Rescue Again!

Virtual memory seems terribly inefficient, but it works because of locality.
At any point in time, programs tend to access a set of active virtual pages called the working set
- Programs with better temporal locality will have smaller working sets
If (working set size < main memory size)
- Good performance for one process after compulsory misses
If ( SUM(working set sizes) > main memory size )
- Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously

Notes:

Just like with the cache but a little bit at a time.
The working set is this natural concept of the set of addresses that you are working with currently
- It is a more fuzzy concept
If the working set is much less than the memory size then we will have a good performance
Trashing: Data is going back and forth from the disk to the DRAM, which is very slow.

VM as a tool for memory management

Virtual Address Space

Key idea: each process has its own virtual address space
- It can view memory as a simple linear array
- Mapping function scatters addresses through physical memory
  - Well-chosen mappings can improve locality

Pasted image 20251120134205.png|500

The process never has to see what is on the page table (which is probably very complex)
From the process perspective, it doesn't need to know, it is just leaving in a virtual world of bytes that are just available all the time; think about the Matrix movie.
Though the process can know when trashing is occurring because it has a notion of time.
What if process want to share memory with each other?
- There is a controlled way to do this through something called "shared memory"
Simplifying memory allocation
- Each virtual page can be mapped to any physical page
- A virtual page can be stored in different physical pages at different times
Sharing code and data among processes
- Map virtual pages to the same physical page (here: PP 6)
- There is a controlled way to do this through something called "shared memory"

Notes:

Note how powerful this is!

Simplifying Linking and Loading

Linking
- Each program has similar virtual address space
- Code, data, and heap always start at the same addresses.
Loading
- execve allocates virtual pages for .text and .data sections & creates PTEs marked as invalid
- The .text and .data sections are copied, page by page, on demand by the virtual memory system

Pasted image 20251120134442.png|400

VM as a tool for memory protection

Memory Protection

Extend PTEs with permission bits
MMU checks these bits on each access
We can exploit the physical to virtual translation to implement other features
- Data that is shared between different programs
- A buffer that we don't want to be overwritten by a malicious hacker; mark it as read only
- Define what operations can be done on those pages.

Address Translation

Virtual Address Space
- V = {0, 1, …, N–1}
Physical Address Space
- P = {0, 1, …, M–1}
Address Translation
- MAP: V → _P U {∅}
- For virtual address a:
  - MAP(a) = a’ if data at virtual address a is at physical address a’ in P
  - MAP(a) = ∅ if data at virtual address a is not in physical memory
    - Either invalid or stored on disk

Summary of Address Translation Symbols

Basic Parameters
- N = 2^n : Number of addresses in virtual address space
- M = 2^m : Number of addresses in physical address space
- P = 2^p : Page size (bytes)
Components of the virtual address (VA)
- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number
Components of the physical address (PA)
- PPO: Physical page offset (same as VPO)
- PPN: Physical page number

Address Translation With a Page Table

Pasted image 20251120135509.png|700

Add index to page table
Read memory (1)
Read memory (2)

The last 4 bits of the virtual page are ignored because of the page offset
The upper bits are the index into the page table
Virtual page offset becomes the physical page offset (it is copied over)
Translate the virtual page number to a physical page number by using the index from the page table
Every process has its own page table base register that says were the page table begins in physical memory.

Address Translation: Page Hit

Pasted image 20251202131728.png|500

Processor sends virtual address to MMU
MMU fetches PTEA from page table in memory
MMU fetches PTE from page table in memory
MMU sends physical address to cache/memory
Cache/memory sends data word to processor

Address Translation Page Fault

Usually handled in software by the OS
Pasted image 20251202131814.png|600

Processor sends virtual address to MMU
MMU fetches PTEA from page table in memory
MMU fetches PTE from page table in memory
Valid bit is zero, so MMU triggers page fault exception
Handler identifies victim (and, if dirty, pages it out to disk)
Handler pages in new page and updates PTE in memory
Handler returns to original process, restarting faulting instruction

Integrating VM and Cache

Pasted image 20251202132215.png|600
VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address

Speeding up Translation with a TLB

Page table entries (PTEs) are cached in L1 like any other memory word
- PTEs may be evicted by other data references
- PTE hit still requires a small L1 delay
Solution: Translation Lookaside Buffer (TLB)
- Small set-associative hardware cache in MMU
- Maps virtual page numbers to physical page numbers
- Contains complete page table entries for small number of pages

Notes:

If we have a page fault, we will read the page table to find that translation
The next time we go on and access the same page we will find it really quickly
There is a lot of research that go into managing TLBs
- It is getting a little bigger but it is just like a regular cache, instead the contents are not blocks of data, instead they are page table entries.

Accessing the TLB

MMU uses the VPN portion of the virtual address to access the TLB:

TLB Hit

Pasted image 20251202132510.png|450

A TLB hit eliminates a memory access
- Speeds up retrieval of virtual memory

TLB Miss

Pasted image 20251202132558.png|450

**A TLB miss incurs an additional memory access (the PTE)
Fortunately, TLB misses are rare. Why?

Multi-Level Page Tables

Suppose:
- 4KB (212) page size, 48-bit address space, 8-byte PTE
Problem:
- Would need a 512 GB page table!
  - 2⁴⁸ * 2^-12 * 2³ = 2³⁹ bytes
Common solution: Multi-level page table
Example: 2-level page table
- Level 1 table: each PTE points to a page table (always memory resident)
- Level 2 table: each PTE points to a page (paged in and out like any other data)

Pasted image 20251202132920.png|200

This makes it smaller and manageable

A Two-Level Page Table Hierarchy

Pasted image 20251202133045.png|700

Translating with a k-level

Pasted image 20251202133223.png|600

Now instead of taking 2 memory access it takes k+1 accesses
It doesn't look super efficient, this is when TLBs are even better because they can cache page table entries in SRAM to bypass all of these accesses which take a long time
There are also MMU caches which are similar to TLBs but are more distributed along the chips

Remember: 3 Kinds of misses in caches?

Compulsory misses
Capacity misses
- Cache smalls to hold everything
Conflict misses
- If two different assesses have the same set index bits
- Direct map cache has a lot of conflicts because many blocks match the same index
Coherence misses
- Due to the fact that a cache block was in a different cache in a different processor but it got invalidated.

Summary

Programmer’s view of virtual memory
- Each process has its own private linear address space
- Cannot be corrupted by other processes
System view of virtual memory
- Uses memory efficiently by caching virtual memory pages
  - Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions