AI Architecture 10. Padding and Pooling Hardware Issues
In the previous 3 Mappings of Conv Operations, we explored the massive trade-off (like Im2Col) of exchanging memory ...
In the previous 3 Mappings of Conv Operations, we explored the massive trade-off (like Im2Col) of exchanging memory ...
In the previous post, we learned that hardware loves CNNs (Convolutional Neural Networks) because of Locality and Data Reuse. Theoretically, CNNs seem like the perfect hardware-friendly algorithm.
In the previous post, we confirmed how inefficient MLP (Fully Connected Layer) is from a hardware perspective. Due to its structure of fetching a weight once,
In previous posts, we learned about Quantization techniques to shave down data size to reduce hardware costs. So, why do we try so desperately to reduce data size?
In the previous post, we examined how differences in Number Formats affect hardware area and power consumption. We established that FP32 ...
In the previous post, we explored the difference between Training and Inference, seeing how inference-only NPUs lighten the hardware structure. One of the key keywords for this optimization was 'Reduction of Precision.'