AI & HW Fundamentals

AI Architecture 13. Roofline Model Analysis

In our previous posts, we discussed the two main culprits degrading deep learning model performance:

AI Architecture 12. Skip Connection: ResNet and Bottlenecks

In the previous MLP and Memory Wall, we discussed the "memory wall" phenomenon, where memory bandwidth limits system performance. In CNN and Locality,

AI & HW Fundamentals

AI Architecture 11. Depthwise Separable Conv: The MobileNet Paradox

In the previous 3 Mappings of Conv Operations, we looked at a strategy to sacrifice memory and gain computational speed (GEMM) through the Im2Col method when processing standard convolutions in hardware.

AI & HW Fundamentals

AI Architecture 10. Padding and Pooling Hardware Issues

In the previous 3 Mappings of Conv Operations, we explored the massive trade-off (like Im2Col) of exchanging memory ...

AI & HW Fundamentals

AI Architecture 9. Three Mappings of Conv Operations: Direct vs. Im2Col vs. Winograd

In the previous post, we learned that hardware loves CNNs (Convolutional Neural Networks) because of Locality and Data Reuse. Theoretically, CNNs seem like the perfect hardware-friendly algorithm.

AI & HW Fundamentals

AI Architecture 8. CNN and Locality

In the previous post, we confirmed how inefficient MLP (Fully Connected Layer) is from a hardware perspective. Due to its structure of fetching a weight once,

AI Architecture 13. Roofline Model Analysis

AI Architecture 12. Skip Connection: ResNet and Bottlenecks

AI Architecture 11. Depthwise Separable Conv: The MobileNet Paradox

AI Architecture 10. Padding and Pooling Hardware Issues

AI Architecture 9. Three Mappings of Conv Operations: Direct vs. Im2Col vs. Winograd

AI Architecture 8. CNN and Locality

Sitemap

Category

Information