An attention mechanism that computes relationships between a subset of token positions rather than all pairs; reduces quadratic scaling cost of full attention while preserving most information for relevant contexts