Main Points

• Caching in computer systems
  • How can we improve performance of memory accesses?

• Demand-paged virtual memory
  • Can we provide the illusion of near infinite memory in limited physical memory?

• Cache replacement policies
  • On a cache miss, how do we choose which entry to replace?
Definitions

• Cache
  • Copy of data that is faster to access than the original

• Temporal locality
  • Programs tend to reference same locations multiple times
  • E.g., instructions in a loop

• Spatial locality
  • Programs tend to reference nearby locations
  • E.g., data in a loop
Cache Concept (Read)

Cache

Fetch Address

Address In Cache?

Yes

Store Value in Cache

No

Fetch Address
Cache Concept (Write)

- **Write through**
  - Changes sent immediately to next level of storage
- **Write back**
  - Changes stored in cache until cache block is replaced
Memory Hierarchy

- Intel Core i9 has 512KB 1\textsuperscript{st} level, 2MB 2\textsuperscript{nd} level, and 16MB 3\textsuperscript{rd} level cache

<table>
<thead>
<tr>
<th>Cache</th>
<th>Hit Cost</th>
<th>Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>1st level cache/first level TLB</td>
<td>1 ns</td>
<td>64 KB</td>
</tr>
<tr>
<td>2nd level cache/second level TLB</td>
<td>4 ns</td>
<td>256 KB</td>
</tr>
<tr>
<td>3rd level cache</td>
<td>12 ns</td>
<td>2 MB</td>
</tr>
<tr>
<td>Memory (DRAM)</td>
<td>100 ns</td>
<td>10 GB</td>
</tr>
<tr>
<td>Data center memory (DRAM)</td>
<td>100 (\mu)s</td>
<td>100 TB</td>
</tr>
<tr>
<td>Local non-volatile memory</td>
<td>100 (\mu)s</td>
<td>100 GB</td>
</tr>
<tr>
<td>Local disk</td>
<td>10 ms</td>
<td>1 TB</td>
</tr>
<tr>
<td>Data center disk</td>
<td>10 ms</td>
<td>100 PB</td>
</tr>
<tr>
<td>Remote data center disk</td>
<td>200 ms</td>
<td>1 XB</td>
</tr>
</tbody>
</table>
Main Points

• Caching in computer systems
  • How can we improve performance of memory accesses?

• Demand-paged virtual memory
  • Can we provide the illusion of near infinite memory in limited physical memory?

• Cache replacement policies
  • On a cache miss, how do we choose which entry to replace?
• Page table has an invalid entry for the referenced page and the data for the page is stored on disk.
After page fault, page table has a valid entry for referenced page with the page frame containing the data that had been stored on disk.

Old contents of the page frame are stored on disk and the page table entry that previously pointed to the page frame is set to invalid.
Demand Paging

- TLB miss
- Page table walk
- Page fault (page invalid in page table)
- Trap to kernel
- Convert virtual address to file + offset
- Allocate page frame
  - Evict page if needed
- Initiate disk block read into page frame
- Disk interrupt when DMA complete
- Mark page as valid
- Resume process at faulting instruction
- TLB miss
- Page table walk to fetch translation
- Execute instruction
Allocating a Page Frame

• Select old page to evict
• Find all page table entries that refer to old page
  • If page frame is shared
• Set each page table entry to invalid
• Remove any TLB entries
  • Copies of now invalid page table entry
• Write changes on page back to disk, if necessary
Has the Page been Accessed?

• Every page table entry has some bookkeeping
  • Has page been modified?
    • Dirty bit: Set by hardware on store instruction
      • In both TLB and page table entry
  • Has page been recently used?
    • Use bit: Set by hardware in page table entry on every TLB miss

• Bookkeeping bits can be reset by the OS kernel
  • When changes to page are flushed to disk
  • To track whether page is recently used
Main Points

• Caching in computer systems
  • How can we improve performance of memory accesses?
• Demand-paged virtual memory
  • Can we provide the illusion of near infinite memory in limited physical memory?
• Cache replacement policies
  • On a cache miss, how do we choose which entry to replace?
Cache Replacement Policy

- Assume new entry is more likely to be used in the near future

- Policy goal: reduce cache misses to
  - Improve expected case performance
  - Reduce likelihood of very poor performance
MIN, LRU, LFU

• MIN
  • Replace the cache entry that will not be used for the longest time into the future
  • Optimality proof based on exchange: if evict an entry used sooner, that will trigger an earlier cache miss

• Least Recently Used (LRU)
  • Replace the cache entry that has not been used for the longest time in the past
  • Approximation of MIN

• Least Frequently Used (LFU)
  • Replace the cache entry used the least often (in the recent past)
LRU vs MIN for Sequential Scan

<table>
<thead>
<tr>
<th>Reference</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
<th>A</th>
<th>B</th>
<th>C</th>
<th>D</th>
<th>E</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>A</td>
<td></td>
<td></td>
<td>E</td>
<td></td>
<td>D</td>
<td></td>
<td>E</td>
<td>C</td>
<td></td>
<td>A</td>
<td>B</td>
<td>C</td>
<td>D</td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>B</td>
<td></td>
<td>C</td>
<td></td>
<td>A</td>
<td>E</td>
<td>D</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>C</td>
<td></td>
<td></td>
<td>B</td>
<td>A</td>
<td>E</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>D</td>
<td></td>
<td></td>
<td>C</td>
<td></td>
<td></td>
<td>B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Reference</th>
<th>A</th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>A</td>
<td>+</td>
<td>+</td>
<td>+</td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>B</td>
<td>+</td>
<td>+</td>
<td>+</td>
<td>C</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>C</td>
<td></td>
<td>D</td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>D</td>
<td>E</td>
<td>+</td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Scanning through memory, where the scan is slightly larger than the cache size
LRU vs FIFO vs MIN

<table>
<thead>
<tr>
<th>Reference</th>
<th>A</th>
<th>B</th>
<th>A</th>
<th>C</th>
<th>B</th>
<th>D</th>
<th>A</th>
<th>D</th>
<th>E</th>
<th>D</th>
<th>A</th>
<th>E</th>
<th>B</th>
<th>A</th>
<th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>A</td>
<td>+</td>
<td>+</td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>B</td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>C</td>
<td></td>
<td>E</td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>D</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>+</td>
<td>+</td>
<td></td>
<td>C</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Reference</th>
<th>A</th>
<th>B</th>
<th>A</th>
<th>C</th>
<th>B</th>
<th>D</th>
<th>A</th>
<th>D</th>
<th>E</th>
<th>D</th>
<th>A</th>
<th>E</th>
<th>B</th>
<th>A</th>
<th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>A</td>
<td>+</td>
<td>+</td>
<td>E</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>B</td>
<td>+</td>
<td></td>
<td>A</td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>C</td>
<td></td>
<td>+</td>
<td>B</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>D</td>
<td></td>
<td>+</td>
<td>C</td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Reference</th>
<th>A</th>
<th>B</th>
<th>A</th>
<th>C</th>
<th>B</th>
<th>D</th>
<th>A</th>
<th>D</th>
<th>E</th>
<th>D</th>
<th>A</th>
<th>E</th>
<th>B</th>
<th>A</th>
<th>C</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>A</td>
<td>+</td>
<td>+</td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>B</td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td>+</td>
<td>C</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td>C</td>
<td>E</td>
<td></td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td>D</td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>+</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

- Reference pattern exhibits temporal locality
Belady’s Anomaly

- Larger cache suffers ten cache misses, while the smaller cache has one fewer
Clock Algorithm: Estimating LRU

- Periodically, sweep through all pages
- If page is unused, reclaim
- If page is used, mark as unused
Nth Chance: Not Recently Used

• Instead of one bit per page, keep an integer
  • notInUseSince: number of sweeps since last use
• Periodically sweep through all page frames

```c
if (page is used) {
    notInUseSince = 0;
} else if (notInUseSince < N) {
    notInUseSince++;
} else {
    reclaim page;
}
```
Implementation Note

- Clock and Nth Chance can run synchronously
  - In page fault handler, run algorithm to find next page to evict
  - Might require writing changes back to disk first
- Or asynchronously
  - Create a thread to maintain a pool of recently unused, clean pages
  - Find recently unused dirty pages, write mods back to disk
  - Find recently unused clean pages, mark as invalid and move to pool
  - On page fault, check if requested page is in pool!
  - If not, evict a page from the pool
Acknowledgment

• This lecture is a slightly modified version of the one prepared by Tom Anderson