Difficult topic • Memory consistency models are difficult to understand - Knowing when and how to use memory barriers in your programs takes a long time to master • I read the long version of this paper about once a year - Started in graduate architecture, still mastering this • Even if you can't master this material, it is worth conveying some intuitions and getting you started on the path - Multi-core programming is increasingly common CSE 506: Operating Systems Background In the 90s, people were figuring out how to build and program shared memory multi-processors Several hardware and compiler optimizations that worked well on single-CPU systems were causing "heisen-bugs" in correct parallel code Disabling all optimizations made this code correct, but slow Various consistency models strike different balances between optimization and programmability CSE 506: Operating Systems #### Distinction - Compiler/CPU can figure out when instructions can be safely reordered within a given thread - Hard to figure out when the order is meaningful to coordinate with other threads - If you want optimizations (and you do), programmer MUST give hardware and compiler some hints - Hard to design hints that average programmer can successfully give the hardware **CSE 506: Operating Systems** #### **Definitions** - Cache coherence: The protocol by which writes to one cache invalidate or update other caches - Memory consistency model: How are updates to memory published from one CPU to another - Reordering between CPU and cache/memory? - Are cache updates/invalidations delivered atomically? Coherence protocol detail that impacts consistency - Distinction between coherence and consistency muddled CSE 506: Operating Systems #### Intuition - On a bus-based multi-processor system (nearly all current x86 CPUs), a write to the cache immediately invalidates other caches - Making the write visible to other CPUs - But, the update could spend some time in a write buffer or register on the CPU - If a later write goes to the cache first, these will become visible to another CPU out of program order CSE 506: Operating Systems ## Sequential Consistency - · Simplest possible model - · Every program instruction is executed in order - No buffered memory writes - Only one CPU writes to memory at a time - Given a write to address x, all cached values of x are invalidated before any CPU can write anything else - · Simple to reason about Stony Brook University CSE 506: Operating Systems ## Sequential is too slow - · CPUs want to pipeline instructions - Hide high latency instructions - Sequential consistency prevents these optimizations - And these optimizations are harmless in the common case Stony Brook University **CSE 506: Operating Systems** # Relaxed consistency - If the common case is that reordering is safe, make the programmer tell the CPU when reordering is upsafe - Details of the model specify what can be reordered - Many different proposed models - Barrier (or fence): common consistency abstraction - Every memory access before this barrier must be visible to other CPUs before any memory access after the barrier - Confusing to use in practice Stony Brook University CSE 506: Operating Systems # Summary - Identifying where to put memory barriers is hard - Takes a lot of practice and careful thought - Looks easy until you try it alone - But, CPUs would be super-slow on sequential consistency - Understand: Why relaxed consistency? What is TSO? Roughly when do developers need barriers? - Advice: Take grad architecture; read this paper yearly