











# of NORTH CAROLINA

# COMP 790: OS Implementation

# Distinction

- Compiler/CPU can figure out when instructions can be safely reordered within a given thread
- Hard to figure out when the order is meaningful to coordinate with other threads
- If you want optimizations (and you do), programmer MUST give hardware and compiler some hints
  - Hard to design hints that average programmer can successfully give the hardware

7

# COMP 790: OS Implementation Definitions Cache coherence: The protocol by which writes to one cache invalidate or update other caches Memory consistency model: How are updates to memory published from one CPU to another Reordering between CPU and cache/memory? Are cache updates/invalidations delivered atomically? Coherence protocol detail that impacts consistency muddled

8

|   | THE UNIVERSITY<br>of NORTH CAROLINA<br>al CHAPEL HILL                                                                            | COMP 790: OS Implementation |
|---|----------------------------------------------------------------------------------------------------------------------------------|-----------------------------|
|   | Intuitio                                                                                                                         | n                           |
| • | On a bus-based multi-process<br>current x86 CPUs), a write to<br>invalidates other caches<br>– Making the write visible to other | the cache immediately       |
| • | But, the update could spend s<br>buffer or register on the CPU                                                                   | some time in a write        |
| • | If a later write goes to the cac<br>become visible to another CP                                                                 | ,                           |
|   |                                                                                                                                  |                             |
| 9 |                                                                                                                                  |                             |

| of NORTH CAROLINA<br>of CHAPEL HILL                 | COMP 790: OS Implementation                                                                                                           |  |
|-----------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|--|
| Sequential is too slow                              |                                                                                                                                       |  |
| <ul><li>Hide high la</li><li>Sequential c</li></ul> | o pipeline instructions<br>atency instructions<br>consistency prevents these optimizations<br>ptimizations are harmless in the common |  |

| at CHAPEL HILL                        | COMP 790: OS Implementation                                                |
|---------------------------------------|----------------------------------------------------------------------------|
| Seque                                 | ential Consistency                                                         |
| <ul> <li>Simplest possible</li> </ul> | model                                                                      |
| <ul> <li>Every program in</li> </ul>  | struction is executed in order                                             |
| <ul> <li>No buffered mer</li> </ul>   | nory writes                                                                |
| Only one CPU wr                       | ites to memory at a time                                                   |
|                                       | address x, all cached values of x are<br>e any CPU can write anything else |
| <ul> <li>Simple to reason</li> </ul>  | about                                                                      |
|                                       |                                                                            |
|                                       |                                                                            |
|                                       |                                                                            |
|                                       |                                                                            |
|                                       |                                                                            |

10

## THE UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

# Relaxed consistency

COMP 790: OS Implementation

- If the common case is that reordering is safe, make the programmer tell the CPU when reordering is unsafe
  - Details of the model specify what can be reordered
  - Many different proposed models
- Barrier (or fence): common consistency abstraction
  - Every memory access before this barrier must be visible to other CPUs before any memory access after the barrier
  - Confusing to use in practice

















## THE UNIVERSITY of NORTH CAROLINA at CHAPEL HILL COMP 790: OS Implementation

Summary • Identifying where to put memory barriers is hard - Takes a lot of practice and careful thought

- Looks easy until you try it alone
- But, CPUs would be super-slow on sequential consistency
- Understand: Why relaxed consistency? What is TSO? Roughly when do developers need barriers?
- Advice: Take grad architecture (if offered); read this paper yearly

19