The hardware-level act: KV cache, paging, speculation
Watch the KV cache grow. See PagedAttention turn the GPU into a tiny operating system. See speculative decoding dance — drafts proposed and verified. This is where SLM serving becomes economical.