An architectural innovation that reduces long-context AI processing cost by selectively attending to relevant tokens rather than computing full attention across all positions; approximately 80% cheaper than dense attention at equivalent context lengths