Blog - RTLearner

AI Architecture 8. CNN and Locality

In the previous post, we confirmed how inefficient MLP (Fully Connected Layer) is from a hardware perspective. Due to its structure of fetching a weight once,

AI & HW Fundamentals

AI Architecture 7. MLP and the Memory Wall

In previous posts, we learned about Quantization techniques to shave down data size to reduce hardware costs. So, why do we try so desperately to reduce data size?

AI & HW Fundamentals

AI Architecture 6. INT8 Quantization Basics

In the previous post, we examined how differences in Number Formats affect hardware area and power consumption. We established that FP32 ...

AI & HW Fundamentals

[AI Architecture] 5. The Weight of Data (Number Formats): How FP32 Impacts Hardware Area and Power

In the previous post, we explored the difference between Training and Inference, seeing how inference-only NPUs lighten the hardware structure. One of the key keywords for this optimization was 'Reduction of Precision.'

AI & HW Fundamentals

AI Architecture 4. Training vs. Inference

In previous posts, we learned that MAC operations, Memory Hierarchies, and Parallel Processing (SIMD) form the foundation of deep learning hardware. Now, we stand at the biggest crossroad in the AI semiconductor market:

AI & HW Fundamentals

AI Architecture 3. The Aesthetics of MatMul: Why Deep Learning Chooses GPUs/NPUs

In previous posts, we examined the significant hardware costs incurred by the fundamental operation of artificial neurons, the MAC (Multiply-Accumulate),

AI Architecture 8. CNN and Locality

AI Architecture 7. MLP and the Memory Wall

AI Architecture 6. INT8 Quantization Basics

[AI Architecture] 5. The Weight of Data (Number Formats): How FP32 Impacts Hardware Area and Power

AI Architecture 4. Training vs. Inference

AI Architecture 3. The Aesthetics of MatMul: Why Deep Learning Chooses GPUs/NPUs

Sitemap

Category

Information