











## Port permissions

- ✤ Can be set with IOPL flag in EFLAGS
- Or at finer granularity with a bitmap in task state segment
  - Recall: this is the "other" reason people care about the TSS



#### Clocks (again, but different) • CPU Clock Speed: What does it mean at electrical level? • New inputs raise current on some wires, lower on others • How long to propagate through all logic gates? • Clock speed sets a safe upper bound • Things like distance, wire size can affect propagation time • At end of a clock cycle read outputs reliably • May be in a transient state mid-cycle

 Not talking about timer device, which raises interrupts at wall clock time; talking about CPU GHz

## Clock imbalance

- All processors have a clock
  - \* Including the chips on every device in your system
  - \* Network card, disk controller, usb controler, etc.
  - \* And bus controllers have a clock
- \* Think now about older devices on a newer CPU
- \* Newer CPU has a much faster clock cycle
- It takes the older device longer to reliably read input from a bus than it does for the CPU to write it

## More clock imbalance

- • • • •
- Ex: a CPU might be able to write 4 different values into a device input register before the device has finished one clock cycle
- Driver writer needs to know this
  - Read from manuals
- Driver must calibrate device access frequency to device speed
  - \* Figure out both speeds, do math, add delays between ops
  - You will do this in lab 6! (outb 0x80 is handy!)

## CISC silliness?

- Is there any good reason to use dedicated instructions and address space for devices?
- Why not treat device input and output registers as regions of physical memory?

## Simplification \* Map devices onto regions of physical memory + Hardware basically redirects these accesses away from

- RAM at same location (if any), to devices \* A bummer if you "lose" some RAM
- + Win: Cast interface regions to a structure
  - \* Write updates to different areas using high-level languages
  - \* Still subject to timing, side-effect caveats



## Optimizations (2) \* Recall: Common optimizations (compiler and CPU)

- Out-of-order execution
- Reorder writes 4
- Cache values in registers
- When we write to a device, we want the write to really happen, now!
  - + Do not keep it in a register, do not collect \$200
- \* Note: both CPU and compiler optimizations must be disabled

## volatile keyword

- A volatile variable cannot be cached in a register
  - \* Writes must go directly to memory
- \* Reads must always come from memory/cache
- volatile code blocks cannot be reordered by the compiler
- \* Must be executed precisely at this point in program
- ✤ E.g., inline assembly
- \_\_volatile\_\_ means I really mean it!

## Compiler barriers \* Inline assembly has a set of clobber registers

- \* Hand-written assembly will clobber them
- Compiler's job is to save values back to memory before ÷ inline asm; no caching anything in these registers
- \* "memory" says to flush all registers
  - + Ensures that compiler generates code for all writes to memory before a given operation

## **CPU** Barriers

- Advanced topic: Don't need details
- Basic idea: In some cases, CPU can issue loads and 4 stores out of program order (optimize perf)
- \* Subject to many constraints on x86 in practice
- In some cases, a "fence" instruction is required to ensure that pending loads/stores happen before the CPU moves forward
  - \* Rarely needed except in device drivers and lock-free data structures

## Configuration

\* Where does all of this come from?

- \* Who sets up port mapping and I/O memory mappings?
- \* Who maps device interrupts onto IRQ lines?
- ✤ Generally, the BIOS
  - + Sometimes constrained by device limitations
  - Older devices hard-coded IRQs
  - Older devices may only have a 16-bit chip
    - + Can only access lower memory addresses

# ISA memory hole

Recall the "memory hole" from lab 2?

#### ✤ 640 KB – 1 MB

- Required by the old ISA bus standard for I/O mappings
  - \* No one in the 80s could fathom > 640 KB of RAM
  - Devices sometimes hard-coded assumptions that they would be in this range
  - Generally reserved on x86 systems (like JOS)
  - \* Strong incentive to save these addresses when possible









| I<br>•- | From Linux De | vice Driver | s     |
|---------|---------------|-------------|-------|
|         | Host Bridge   | PCI Bridge  | Bur 1 |







### Direct Memory Access (DMA)

- Simple memory read/write model bounces all I/O through the CPU
  - + Fine for small data, totally awful for huge data
- Idea: just write where you want data to go (or come from) to device
  - + Let device do bulk data transfers into memory without CPU intervention
  - \* Interrupt CPU on I/O completion (asynchronous)

## Ring buffers

- \* Many devices pre-allocate a "ring" of buffers
  - ✤ Think network card
- + Device writes into ring; CPU reads behind
- + If ring is well-sized to the load:
  - No dynamic buffer allocation
  - ✤ No stalls
- Trade-off between device stalls (or dropped packets) and memory overheads





# Why does x86 suddenly care about IOMMUs?

- + Virtualization! (VT-d)
- ✤ Scenario: system with 4 NICs, 4 VMs
- Without IOMMU: Hypervisor must mediate all network traffic
- With IOMMU: Each VM can have a different virtual bus address space
  - Looks like a single NIC; can only issue DMAs for its own memory (not other VM's memory)
  - \* No Hypervisor mediation needed!

## VT-d Limitations

- + IOMMU device restrictions are all-or-nothing
- Towning device restrictions are an or in
- + Can't share a network card
- Although some devices may fix this too
  VT-d is only for devices on the PCI-Express bus
- Usually just graphics and high-end network cards
- Legacy PCI devices are behind a bridge
- All-or-nothing for an entire bridge
  - Similarly, no per-disk access control
    - \* All-or-nothing for disk controller (which multiplexes disks)

## Summary

- + How to access devices: ports or memory
- + Issues with CPU optimizations, timing delays, etc.
- Overview of PCI bus
- + Overview of DMA and protection issues
  - IOMMU and use for virtualization