









#### Stony Brook University

#### CSE 506: Operating Systems

## Distinction

- Compiler/CPU can figure out when instructions can be safely reordered within a given thread
- Hard to figure out when the order is meaningful to coordinate with other threads
- If you want optimizations (and you do), programmer MUST give hardware and compiler some hints
- Hard to design hints that average programmer can successfully give the hardware

#### Stony Brook University

# CSE 506: Operating Systems

#### Definitions

- Cache coherence: The protocol by which writes to one cache invalidate or update other caches
- Memory consistency model: How are updates to memory published from one CPU to another
  - Reordering between CPU and cache/memory?
  - Are cache updates/invalidations delivered atomically?
    Coherence protocol detail that impacts consistency
- Distinction between coherence and consistency muddled

# Stony Brook University CSE 506: Operating Systems

- On a bus-based multi-processor system (nearly all current x86 CPUs), a write to the cache immediately invalidates other caches
  - Making the write visible to other CPUs
- But, the update could spend some time in a write buffer or register on the CPU
- If a later write goes to the cache first, these will become visible to another CPU out of program order

#### Stony Brook University

### CSE 506: Operating Systems

# Sequential Consistency

- Simplest possible model
- Every program instruction is executed in order
  No buffered memory writes
- Only one CPU writes to memory at a time
  Given a write to address x, all cached values of x are invalidated before any CPU can write anything else
- Simple to reason about

# 

Stony Brook University

• And these optimizations are harmless in the common case

CSE 506: Operating Systems

# Stony Brook University

# CSE 506: Operating Systems

# Relaxed consistency

- If the common case is that reordering is safe, make the programmer tell the CPU when reordering is unsafe
  - Details of the model specify what can be reordered
  - Many different proposed models
- Barrier (or fence): common consistency abstraction
- Every memory access before this barrier must be visible to other CPUs before any memory access after the barrier
   Confusing to use in practice
- Confusing to use in practice

#### CSE 506: Operating Systems

- Total Store Order (TSO)
- Model adopted in nearly all x86 CPUs

Stony Brook University

- All stores leave the CPU in program order
- CPU may load "ahead" of an unrelated store
  Ex: x = 1; y = z;
  - CPU may load z from memory before x is stored
- CPU may not reorder load and store of same variable
- Atomic instructions are treated like a barrier











# Stony Brook University CSE 506: Operating Systems

# Summary

- Identifying where to put memory barriers is hard
  Takes a lot of practice and careful thought
  Looks easy until you try it alone
- But, CPUs would be super-slow on sequential consistency
- Understand: Why relaxed consistency? What is TSO? Roughly when do developers need barriers?
- Advice: Take grad architecture; read this paper yearly