{"id":1294,"date":"2026-01-14T10:16:23","date_gmt":"2026-01-14T01:16:23","guid":{"rendered":"https:\/\/rtlearner.com\/?p=1294"},"modified":"2026-01-14T10:16:26","modified_gmt":"2026-01-14T01:16:26","slug":"ai-architecture-8-cnn-locality-sram-data-reuse","status":"publish","type":"post","link":"https:\/\/rtlearner.com\/en\/ai-architecture-8-cnn-locality-sram-data-reuse\/","title":{"rendered":"AI Architecture 8. CNN and Locality"},"content":{"rendered":"

In the previous post, we confirmed how inefficient MLP (Fully Connected Layer) is from a hardware perspective. Due to its structure of fetching a weight once, using it exactly once, and then discarding it, system performance suffers from the Memory Wall phenomenon, limited by memory bandwidth.<\/p>\n\n\n\n

However, the real protagonist that allowed deep learning to change the world was not the MLP, but the CNN (Convolutional Neural Network). While algorithm researchers praise CNNs for \"capturing spatial features of images well,\" Hardware Architects like us love CNNs for a completely different reason.<\/p>\n\n\n\n

That reason is \"Locality\" and \"Reuse.\" In this article, we will uncover the physical reasons why the Sliding Window<\/strong> method of CNNs maximizes the efficiency of the SRAM (On-chip Buffer)<\/strong> inside semiconductor chips and why NPUs can only unleash their full performance (TOPS) when running CNNs.<\/p>\n\n\n