











# Intuition

- On a bus-based multi-processor system (nearly all current x86 CPUs), a write to the cache immediately invalidates
  - \* Making the write visible to other CPUs

other caches

- But, the update could spend some time in a write buffer or register on the CPU
- ✤ If a later write goes to the cache first, these will become visible to another CPU out of program order

### Sequential Consistency

- \* Simplest possible model
- + Every program instruction is executed in order
  - \* No buffered memory writes
- \* Only one CPU writes to memory at a time
  - Given a write to address x, all cached values of x are invalidated before any CPU can write anything else
- ✤ Simple to reason about

#### Sequential is too slow

- CPUs want to pipeline instructions
  - Hide high latency instructions
- \* Sequential consistency prevents these optimizations
- + And these optimizations are harmless in the common case

### Relaxed consistency

- If the common case is that reordering is safe, make the programmer tell the CPU when reordering is unsafe
  - \* Details of the model specify what can be reordered
  - Many different proposed models
- Barrier (or fence): common consistency abstraction
  - Every memory access before this barrier must be visible to other CPUs before any memory access after the barrier
  - ✤ Confusing to use in practice

## Total Store Order (TSO)

- \* Model adopted in nearly all x86 CPUs
- \* All stores leave the CPU in program order
- \* CPU may load "ahead" of an unrelated store
  - ✤ Ex: x = 1; y = z;
  - + CPU may load z from memory before x is stored
  - \* CPU may not reorder load and store of same variable
- + Atomic instructions are treated like a barrier







4



