AI Architecture 8. CNN and Locality
In the previous post, we confirmed how inefficient MLP (Fully Connected Layer) is from a hardware perspective. Due to its structure of fetching a weight once,
In the previous post, we confirmed how inefficient MLP (Fully Connected Layer) is from a hardware perspective. Due to its structure of fetching a weight once,
In previous posts, we learned about Quantization techniques to shave down data size to reduce hardware costs. So, why do we try so desperately to reduce data size?
In the previous post, we examined how differences in Number Formats affect hardware area and power consumption. We established that FP32 ...
In the previous post, we explored the difference between Training and Inference, seeing how inference-only NPUs lighten the hardware structure. One of the key keywords for this optimization was 'Reduction of Precision.'
In previous posts, we learned that MAC operations, Memory Hierarchies, and Parallel Processing (SIMD) form the foundation of deep learning hardware. Now, we stand at the biggest crossroad in the AI semiconductor market:
In previous posts, we examined the significant hardware costs incurred by the fundamental operation of artificial neurons, the MAC (Multiply-Accumulate),