AI Architecture - RTLearner

AI Architecture 15. The Heart of Systolic Array

The Von Neumann architecture, the origin of modern computing, separates the 'Processing Unit (CPU)' from the 'Storage Unit (Memory)'.

NPU design & Optimization

AI Architecture 14. Dataflow Taxonomy: TPU vs Output Stationary vs Row Stationary

In the previous post, we quantitatively confirmed that hardware performance limits ...

AI & HW Fundamentals

AI Architecture 13. Roofline Model Analysis

In our previous posts, we discussed the two main culprits degrading deep learning model performance:

AI & HW Fundamentals

AI Architecture 12. Skip Connection: ResNet and Bottlenecks

In the previous MLP and Memory Wall, we discussed the "memory wall" phenomenon, where memory bandwidth limits system performance. In CNN and Locality,

AI & HW Fundamentals

AI Architecture 11. Depthwise Separable Conv: The MobileNet Paradox

In the previous 3 Mappings of Conv Operations, we looked at a strategy to sacrifice memory and gain computational speed (GEMM) through the Im2Col method when processing standard convolutions in hardware.

AI & HW Fundamentals

AI Architecture 10. Padding and Pooling Hardware Issues

In the previous 3 Mappings of Conv Operations, we explored the massive trade-off (like Im2Col) of exchanging memory ...

AI Architecture 15. The Heart of Systolic Array

AI Architecture 14. Dataflow Taxonomy: TPU vs Output Stationary vs Row Stationary

AI Architecture 13. Roofline Model Analysis

AI Architecture 12. Skip Connection: ResNet and Bottlenecks

AI Architecture 11. Depthwise Separable Conv: The MobileNet Paradox

AI Architecture 10. Padding and Pooling Hardware Issues

Sitemap

Category

Information