Virtual Memory - Concepts
Class: CSCE-312
Notes:
Today
- Address spaces
- VM as a tool for caching
- VM as a tool for memory management
- VM as a tool for memory protection
- Address translation
Address spaces
A System Using Physical Addressing
/CSCE-312/Visual%20Aids/Pasted%20image%2020251118135140.png)
- Used in “simple” systems like embedded microcontrollers in devices like cars, elevators, and digital picture frames
- Think of "prefetchers"
- Hardware prefetchers -> Actually been used in real microprocessors!
- Interesting to see how prefetchers are design to anticipate the future memory behavior of a program
A System Using Virtual Addressing
/CSCE-312/Visual%20Aids/Pasted%20image%2020251118135628.png)
- Used in all modern servers, laptops, and smart phones
- One of the great ideas in computer science
- Makes programs easier to reason about and takes care of problems on memory allocation like fragmentation in memory
- Go to a private address space that only your program has access to
- You do not have to worry about what else is going on on the system, you just worry about what your program does.
- In the old days all program had access to all memory, and a program could invade another program
- We can also use it to do paging
- Page memory to disk, use memory I have, and then once memory is free we pul it from DRAM, etc...
- There are other advantages of virtual memory.
Address Spaces
-
Linear address space: Ordered set of contiguous non-negative integer addresses:
-
Virtual address space: Set of N = 2^n virtual addresses
- Set of integers that can be virtual addresses
-
Physical address space: Set of M = 2^m physical addresses
- Another set of integers that can be physical addresses where data is actually kept in the DRAM
Notes:
- Provide the illusion that we have a lot of memory that in reality (hardware) we may not.
Why Virtual Memory (VM)?
-
Uses main memory efficiently
- Use DRAM as a cache for parts of a virtual address space
-
Simplifies memory management
- Each process gets the same uniform linear address space
-
Isolates address spaces
- One process can’t interfere with another’s memory
- User program cannot access privileged kernel information and code
Notes:
- Virtual Memory and Virtual Machines are two totally different topics:
- Related but different
- DRAM caches the parts of the Virtual Memory that are active
- The other parts are kept in either the disk or in imagination.
- You can have all the blocks of the virtual memory point to the same block of 0s
- Its okay if we redundantly represent this block as a 0 since all of them will be 0s
- When you write something to one of these blocks then is when you allocate it in a different place.
- If something translates imaginary addresses to real addresses, it is different from a different process and they never intersect because these translations are exclusive.
- If we didn't have this, that would be terrible for security!
VM as a tool for caching
VM as a tool for caching
- Conceptually, virtual memory is an array of N contiguous bytes stored on disk.
- The contents of the array on disk are cached in physical memory (DRAM cache)
- These cache blocks are called pages (size is P = 2^p bytes)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120131311.png)
- Think of memory as a much larger array of bits on the disk.
- The actual memory on your system motherboard is cached, we call these pages
- The same concept of cache block, except for memory or virtual memory
- Some of them are cached in physical memory (some living in memory and some are not)
- Usually p = 12 so page size = 4 KB
DRAM Cache Organization
-
DRAM cache organization driven by the enormous miss penalty
- DRAM is about 10x slower than SRAM
- Disk is about 10,000x slower than DRAM
-
Consequences
- Large page (block) size: typically 4 KB, sometimes 4 MB
- Fully associative
- Any VP can be placed in any PP
- Requires a “large” mapping function – different from cache memories
- Highly sophisticated, expensive replacement algorithms
- Too complicated and open-ended to be implemented in hardware
- Replacement is done so infrequently that it is done in software.
- Code for replacement in the Linux kernel is written in C or C++
- Too complicated and open-ended to be implemented in hardware
- Write-back rather than write-through
- We leave update in DRAM, we do not write it into the disk until that page is evicted
Enabling Data Structure: Page Table
- A page table is an array of page table entries (PTEs) that maps virtual pages to physical pages.
- Per-process kernel data structure in DRAM
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120132006.png)
- How do we do the mapping from virtual to physical?
- We have something sophisticated called page table.
- Think of it as an array of objects that helps us map virtual pages to physical pages
- We will have one page table entry for every page in virtual memory
- Indexed by a virtual page number
- Tells us whether the page is in memory or in the disk, and if the page has been modified.
- Page table entry gives us some other useful information.
- Notice some pages aren't map anywhere so they go to the disk.
Page Hit
- Page hit: reference to VM word that is in physical memory (DRAM cache hit)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120132818.png)
Page Fault
- Page fault: reference to VM word that is not in physical memory (DRAM cache miss)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120132857.png)
- Indicates "this is a pointer to the disk, not the memory"
- Is a kind of event that we call an "Exception"
- Interrupts your program (makes it slower), causes your OS to find out what happened
- It needs to go find a page to accomodate for the fault, point to the new page on the page table and restart the program to this point.
Handling Page Fault
- Page miss causes page fault (an exception)
- Page fault handler selects a victim to be evicted (here VP 4)
- Offending instruction is restarted: page hit!
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120133202.png)
- A segmentation fault is exactly the same thing as a page fault where the memory that you are trying to access is inaccessible to the virtual memory system. (needs to risk from disk instead of from memory)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120133335.png)
- **Key point****: Waiting until the miss to copy the page to DRAM is known as demand paging
- Makes it so that we never have to actually access the disk unless it is really necessary
- If we allocated memory through memory allocation and started using it, we will never have to write it to the disk. This is very efficient.
Allocating Pages
- Allocating a new page (VP 5) of virtual memory.
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120133709.png)
Locality to the Rescue Again!
- Virtual memory seems terribly inefficient, but it works because of locality.
- At any point in time, programs tend to access a set of active virtual pages called the working set
- Programs with better temporal locality will have smaller working sets
- If (working set size < main memory size)
- Good performance for one process after compulsory misses
- If ( SUM(working set sizes) > main memory size )
- Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously
Notes:
- Just like with the cache but a little bit at a time.
- The working set is this natural concept of the set of addresses that you are working with currently
- It is a more fuzzy concept
- If the working set is much less than the memory size then we will have a good performance
- Trashing: Data is going back and forth from the disk to the DRAM, which is very slow.
VM as a tool for memory management
Virtual Address Space
- Key idea: each process has its own virtual address space
- It can view memory as a simple linear array
- Mapping function scatters addresses through physical memory
- Well-chosen mappings can improve locality
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120134205.png)
-
The process never has to see what is on the page table (which is probably very complex)
-
From the process perspective, it doesn't need to know, it is just leaving in a virtual world of bytes that are just available all the time; think about the Matrix movie.
-
Though the process can know when trashing is occurring because it has a notion of time.
-
What if process want to share memory with each other?
- There is a controlled way to do this through something called "shared memory"
-
Simplifying memory allocation
- Each virtual page can be mapped to any physical page
- A virtual page can be stored in different physical pages at different times
-
Sharing code and data among processes
- Map virtual pages to the same physical page (here: PP 6)
- There is a controlled way to do this through something called "shared memory"
Notes:
- Note how powerful this is!
Simplifying Linking and Loading
- Linking
- Each program has similar virtual address space
- Code, data, and heap always start at the same addresses.
- Loading
execveallocates virtual pages for .text and .data sections & creates PTEs marked as invalid- The
.textand.datasections are copied, page by page, on demand by the virtual memory system
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120134442.png)
VM as a tool for memory protection
Memory Protection
- Extend PTEs with permission bits
- MMU checks these bits on each access
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120135045.png)
- We can exploit the physical to virtual translation to implement other features
- Data that is shared between different programs
- A buffer that we don't want to be overwritten by a malicious hacker; mark it as read only
- Define what operations can be done on those pages.
Address Translation
-
Virtual Address Space
- V = {0, 1, …, N–1}
-
Physical Address Space
- P = {0, 1, …, M–1}
-
Address Translation
- MAP: V → _P U {∅}
- For virtual address a:
- MAP(a) = a’ if data at virtual address a is at physical address a’ in P
- MAP(a) = ∅ if data at virtual address a is not in physical memory
- Either invalid or stored on disk
Summary of Address Translation Symbols
-
Basic Parameters
- N = 2^n : Number of addresses in virtual address space
- M = 2^m : Number of addresses in physical address space
- P = 2^p : Page size (bytes)
-
Components of the virtual address (VA)
- TLBI: TLB index
- TLBT: TLB tag
- VPO: Virtual page offset
- VPN: Virtual page number
-
Components of the physical address (PA)
- PPO: Physical page offset (same as VPO)
- PPN: Physical page number
Address Translation With a Page Table
/CSCE-312/Visual%20Aids/Pasted%20image%2020251120135509.png)
- Add index to page table
- Read memory (1)
- Read memory (2)
- The last 4 bits of the virtual page are ignored because of the page offset
- The upper bits are the index into the page table
- Virtual page offset becomes the physical page offset (it is copied over)
- Translate the virtual page number to a physical page number by using the index from the page table
- Every process has its own page table base register that says were the page table begins in physical memory.
Address Translation: Page Hit
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202131728.png)
- Processor sends virtual address to MMU
- MMU fetches PTEA from page table in memory
- MMU fetches PTE from page table in memory
- MMU sends physical address to cache/memory
- Cache/memory sends data word to processor
Address Translation Page Fault
Usually handled in software by the OS
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202131814.png)
- Processor sends virtual address to MMU
- MMU fetches PTEA from page table in memory
- MMU fetches PTE from page table in memory
- Valid bit is zero, so MMU triggers page fault exception
- Handler identifies victim (and, if dirty, pages it out to disk)
- Handler pages in new page and updates PTE in memory
- Handler returns to original process, restarting faulting instruction
Integrating VM and Cache
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202132215.png)
VA: virtual address, PA: physical address, PTE: page table entry, PTEA = PTE address
Speeding up Translation with a TLB
-
Page table entries (PTEs) are cached in L1 like any other memory word
- PTEs may be evicted by other data references
- PTE hit still requires a small L1 delay
-
Solution: Translation Lookaside Buffer (TLB)
- Small set-associative hardware cache in MMU
- Maps virtual page numbers to physical page numbers
- Contains complete page table entries for small number of pages
Notes:
- If we have a page fault, we will read the page table to find that translation
- The next time we go on and access the same page we will find it really quickly
- There is a lot of research that go into managing TLBs
- It is getting a little bigger but it is just like a regular cache, instead the contents are not blocks of data, instead they are page table entries.
Accessing the TLB
- MMU uses the VPN portion of the virtual address to access the TLB:
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202132449.png)
TLB Hit
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202132510.png)
- A TLB hit eliminates a memory access
- Speeds up retrieval of virtual memory
TLB Miss
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202132558.png)
- **A TLB miss incurs an additional memory access (the PTE)
- Fortunately, TLB misses are rare. Why?
Multi-Level Page Tables
-
Suppose:
- 4KB (212) page size, 48-bit address space, 8-byte PTE
-
Problem:
- Would need a 512 GB page table!
- 248 * 2-12 * 23 = 239 bytes
- Would need a 512 GB page table!
-
Common solution: Multi-level page table
-
Example: 2-level page table
- Level 1 table: each PTE points to a page table (always memory resident)
- Level 2 table: each PTE points to a page (paged in and out like any other data)
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202132920.png)
- This makes it smaller and manageable
A Two-Level Page Table Hierarchy
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202133045.png)
Translating with a k-level
/CSCE-312/Visual%20Aids/Pasted%20image%2020251202133223.png)
- Now instead of taking 2 memory access it takes k+1 accesses
- It doesn't look super efficient, this is when TLBs are even better because they can cache page table entries in SRAM to bypass all of these accesses which take a long time
- There are also MMU caches which are similar to TLBs but are more distributed along the chips
Remember: 3 Kinds of misses in caches?
- Compulsory misses
- Capacity misses
- Cache smalls to hold everything
- Conflict misses
- If two different assesses have the same set index bits
- Direct map cache has a lot of conflicts because many blocks match the same index
- Coherence misses
- Due to the fact that a cache block was in a different cache in a different processor but it got invalidated.
Summary
-
Programmer’s view of virtual memory
- Each process has its own private linear address space
- Cannot be corrupted by other processes
-
System view of virtual memory
- Uses memory efficiently by caching virtual memory pages
- Efficient only because of locality
- Simplifies memory management and programming
- Simplifies protection by providing a convenient interpositioning point to check permissions
- Uses memory efficiently by caching virtual memory pages