{"id":1312,"date":"2026-01-15T09:59:11","date_gmt":"2026-01-15T00:59:11","guid":{"rendered":"https:\/\/rtlearner.com\/?p=1312"},"modified":"2026-01-15T09:59:13","modified_gmt":"2026-01-15T00:59:13","slug":"ai-architecture-9-convolution-operation-mapping","status":"publish","type":"post","link":"https:\/\/rtlearner.com\/en\/ai-architecture-9-convolution-operation-mapping\/","title":{"rendered":"AI Architecture 9. Three Mappings of Conv Operations: Direct vs. Im2Col vs. Winograd"},"content":{"rendered":"\n<p>\uc9c0\ub09c \uae00\uc5d0\uc11c \uc6b0\ub9ac\ub294 \ud558\ub4dc\uc6e8\uc5b4\uac00 CNN(Convolutional Neural Network)\uc744 \uc0ac\ub791\ud558\ub294 \uc774\uc720\uac00 \uc9c0\uc5ed\uc131(Locality)\uacfc \ub370\uc774\ud130 \uc7ac\uc0ac\uc6a9(Data Reuse) \ub54c\ubb38\uc784\uc744 \ubc30\uc6e0\uc2b5\ub2c8\ub2e4. \uc774\ub860\uc801\uc73c\ub85c CNN\uc740 \uc644\ubcbd\ud55c \ud558\ub4dc\uc6e8\uc5b4 \uce5c\ud654\uc801 \uc54c\uace0\ub9ac\uc998\ucc98\ub7fc \ubcf4\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\ud558\uc9c0\ub9cc \ub9c9\uc0c1 \uc774 \uc54c\uace0\ub9ac\uc998\uc744 \uc2e4\ub9ac\ucf58 \uce69 \uc704\uc5d0 \uad6c\ud604\ud558\ub824\uace0 \ud558\uba74, \ud558\ub4dc\uc6e8\uc5b4\ub294 \uc2ec\uac01\ud55c \ub09c\uad00\uc5d0 \ubd09\ucc29\ud569\ub2c8\ub2e4. \ubc14\ub85c \ubcf5\uc7a1\ud55c \ub8e8\ud504(Loop) \uad6c\uc870\uc785\ub2c8\ub2e4. CNN \uc5f0\uc0b0\uc740 \uae30\ubcf8\uc801\uc73c\ub85c 6\uc911~7\uc911 \ub8e8\ud504(Batch, Out Channel, In Channel, Height, Width, Kernel H, Kernel W)\ub85c \uc774\ub8e8\uc5b4\uc838 \uc788\uc2b5\ub2c8\ub2e4. \uc774 \ubcf5\uc7a1\ud55c \ub8e8\ud504\ub97c \uadf8\ub300\ub85c \ud558\ub4dc\uc6e8\uc5b4\ub85c \uc62e\uae30\uba74(Direct Conv), \uba54\ubaa8\ub9ac \uc811\uadfc \ud328\ud134\uc774 \ub4a4\uc8fd\ubc15\uc8fd\uc774 \ub418\uc5b4 \uce90\uc2dc \ud6a8\uc728(Cache Hit Rate)\uc774 \ubc14\ub2e5\uc744 \uce69\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\uadf8\ub798\uc11c \uc5d4\uc9c0\ub2c8\uc5b4\ub4e4\uc740 \uae30\ubc1c\ud55c \uc544\uc774\ub514\uc5b4\ub97c \ub0c5\ub2c8\ub2e4. &#8220;\uba54\ubaa8\ub9ac\ub97c \uc880 \ub0ad\ube44\ud558\ub354\ub77c\ub3c4, \uc774 \ubcf5\uc7a1\ud55c \uc5f0\uc0b0\uc744 \ub2e8\uc21c\ud55c \ud589\ub82c\uacf1(Matrix Multiplication)\uc73c\ub85c \ubc14\uafd4\ubc84\ub9ac\uba74 \uc5b4\ub5a8\uae4c?&#8221;<\/p>\n\n\n\n<p>\uc774\ubc88 \uae00\uc5d0\uc11c\ub294 \ud558\ub4dc\uc6e8\uc5b4\uc640 \ucef4\ud30c\uc77c\ub7ec\uac00 Conv \uc5f0\uc0b0\uc744 \ucc98\ub9ac\ud558\ub294 3\uac00\uc9c0 \ub300\ud45c\uc801\uc778 \uc804\ub7b5\uc778 <strong>Direct, Im2Col, Winograd<\/strong>\ub97c \ube44\uad50\ud558\uace0, \uadf8 \uc18d\uc5d0 \uc228\uaca8\uc9c4 \uba54\ubaa8\ub9ac\uc640 \uc18d\ub3c4\uc758 \ub4f1\uac00\uad50\ud658 \ubc95\uce59\uc744 \uc54c\uc544\ubcf4\uaca0\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n<style>.kb-table-of-content-nav.kb-table-of-content-id1312_df7614-79 .kb-table-of-content-wrap{padding-top:var(--global-kb-spacing-sm, 1.5rem);padding-right:var(--global-kb-spacing-sm, 1.5rem);padding-bottom:var(--global-kb-spacing-sm, 1.5rem);padding-left:var(--global-kb-spacing-sm, 1.5rem);box-shadow:0px 0px 14px 0px rgba(0, 0, 0, 0.2);}.kb-table-of-content-nav.kb-table-of-content-id1312_df7614-79 .kb-table-of-contents-title-wrap{padding-top:0px;padding-right:0px;padding-bottom:0px;padding-left:0px;}.kb-table-of-content-nav.kb-table-of-content-id1312_df7614-79 .kb-table-of-contents-title{font-weight:regular;font-style:normal;}.kb-table-of-content-nav.kb-table-of-content-id1312_df7614-79 .kb-table-of-content-wrap .kb-table-of-content-list{font-weight:regular;font-style:normal;margin-top:var(--global-kb-spacing-sm, 1.5rem);margin-right:0px;margin-bottom:0px;margin-left:0px;}@media all and (max-width: 767px){.kb-table-of-content-nav.kb-table-of-content-id1312_df7614-79 .kb-table-of-contents-title{font-size:var(--global-kb-font-size-md, 1.25rem);}.kb-table-of-content-nav.kb-table-of-content-id1312_df7614-79 .kb-table-of-content-wrap .kb-table-of-content-list{font-size:var(--global-kb-font-size-sm, 0.9rem);}}<\/style>\n\n<style>.kadence-column1312_a27628-38 > .kt-inside-inner-col{box-shadow:0px 0px 14px 0px rgba(0, 0, 0, 0.2);}.kadence-column1312_a27628-38 > .kt-inside-inner-col,.kadence-column1312_a27628-38 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column1312_a27628-38 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column1312_a27628-38 > .kt-inside-inner-col{flex-direction:column;}.kadence-column1312_a27628-38 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column1312_a27628-38 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column1312_a27628-38{position:relative;}@media all and (max-width: 1024px){.kadence-column1312_a27628-38 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column1312_a27628-38 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column1312_a27628-38\"><div class=\"kt-inside-inner-col\">\n<p><strong>\uad00\ub828 \uae00<\/strong><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-1-neuron-hardware-mac-analysis\/\" data-type=\"post\" data-id=\"1248\">AI Architecture 1. \uc778\uacf5 \ub274\ub7f0\uc758 \ud574\ubd80: silicon\uc5d0\uc11c Y=WX+B \uad6c\ud604<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-2-activation-relu-vs-sigmoid\/\" data-type=\"post\" data-id=\"1255\">AI Architecture 2. \ud65c\uc131\ud654 \ud568\uc218\uc758 \ube44\uc6a9: ReLU vs Sigmoid<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-3-matmul-simd-parallel-processing\/\" data-type=\"post\" data-id=\"1263\">AI Architecture 3. \ud589\ub82c\uacf1(MatMul)\uc758 \ubbf8\ud559: \ub525\ub7ec\ub2dd\uc774 GPU\/NPU\ub97c \uc120\ud0dd\ud55c \uc774\uc720<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-4-training-vs-inference\/\" data-type=\"post\" data-id=\"1267\">AI Architecture 4. \ud559\uc2b5(Training) vs \ucd94\ub860(Inference)<\/a><\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">1. Direct Convolution<\/h2>\n\n\n\n<p>\uac00\uc7a5 \uc9c1\uad00\uc801\uc778 \ubc29\ubc95\uc740 \uc218\uc2dd \uadf8\ub300\ub85c \uad6c\ud604\ud558\ub294 \uac83\uc785\ub2c8\ub2e4. Sliding Window\uac00 \uc774\ubbf8\uc9c0\ub97c \ud55c \uce78\uc529 \ud6d1\uc73c\uba74\uc11c \uacf1\ud558\uace0 \ub354\ud558\ub294 \ubc29\uc2dd\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\uc7a5\uc810:<\/strong> \ucd94\uac00\uc801\uc778 \uba54\ubaa8\ub9ac\uac00 \uc804\ud600 \ud544\uc694 \uc5c6\uc2b5\ub2c8\ub2e4(Zero Memory Overhead). \uc785\ub825 \uc774\ubbf8\uc9c0\uc640 \uac00\uc911\uce58\ub9cc \uc788\uc73c\uba74 \ub429\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>\ub2e8\uc810:<\/strong> \ud558\ub4dc\uc6e8\uc5b4 \uad6c\ud604\uc774 \ub9e4\uc6b0 \uc5b4\ub835\uc2b5\ub2c8\ub2e4.\n<ul class=\"wp-block-list\">\n<li><strong>\ubd88\uaddc\uce59\ud55c \uba54\ubaa8\ub9ac \uc811\uadfc:<\/strong> \uc708\ub3c4\uc6b0\uac00 \uc774\ub3d9\ud560 \ub54c\ub9c8\ub2e4 \ub370\uc774\ud130\uc758 \uc8fc\uc18c\uac00 \ubd88\uc5f0\uc18d\uc801\uc73c\ub85c \ud291\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>\ubcd1\ub82c\ud654\uc758 \uc5b4\ub824\uc6c0:<\/strong> SIMD(\ubcd1\ub82c \ucc98\ub9ac) \uc720\ub2db\uc744 \uaf49 \ucc44\uc6cc\uc11c \ub3cc\ub9ac\uae30\uac00 \ub9e4\uc6b0 \uae4c\ub2e4\ub86d\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>\ucd08\uae30 \ud558\ub4dc\uc6e8\uc5b4\ub098 \uba54\ubaa8\ub9ac\uac00 \uadf9\ub3c4\ub85c \ubd80\uc871\ud55c \uc784\ubca0\ub514\ub4dc \ud658\uacbd\uc774 \uc544\ub2c8\ub77c\uba74, Direct \ubc29\uc2dd\uc740 \ub290\ub9b0 \uc18d\ub3c4 \ub54c\ubb38\uc5d0 \uc798 \uc0ac\uc6a9\ub418\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">2. Im2Col (Image to Column)<\/h2>\n\n\n\n<p>\uc624\ub298\ub0a0 \ub300\ubd80\ubd84\uc758 GPU(cuDNN)\uc640 \ub525\ub7ec\ub2dd \ud504\ub808\uc784\uc6cc\ud06c\uac00 \uc0ac\uc6a9\ud558\ub294 \ud45c\uc900 \ubc29\uc2dd\uc785\ub2c8\ub2e4. \ud575\uc2ec \uc544\uc774\ub514\uc5b4\ub294 3\ucc28\uc6d0 \ud569\uc131\uacf1\uc744 2\ucc28\uc6d0 \ud589\ub82c\uacf1(GEMM)\uc73c\ub85c \ubcc0\ud658\ud558\uc790\ub294 \uac83\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\uc791\ub3d9 \uc6d0\ub9ac<\/h3>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li><strong>Im2Col \ubcc0\ud658:<\/strong> 3 * 3 \ucee4\ub110\uc774 \uc9c0\ub098\uac00\ub294 \uc601\uc5ed(Receptive Field)\uc758 \ud53d\uc140\ub4e4\uc744 \ub72f\uc5b4\ub0b4\uc5b4, \ud558\ub098\uc758 \uae34 \uc5f4(Column) \ubca1\ud130\ub85c \ud3b4\ubc84\ub9bd\ub2c8\ub2e4.<\/li>\n\n\n\n<li>\uc774 \uacfc\uc815\uc744 \ubaa8\ub4e0 \uc708\ub3c4\uc6b0 \uc704\uce58\uc5d0 \ub300\ud574 \ubc18\ubcf5\ud558\uba74, \uac70\ub300\ud55c \uc785\ub825 \ud589\ub82c(Input Matrix)\uc774 \ub9cc\ub4e4\uc5b4\uc9d1\ub2c8\ub2e4.<\/li>\n\n\n\n<li>\ud544\ud130(Weight)\ub4e4\ub3c4 \ub9c8\ucc2c\uac00\uc9c0\ub85c \ud3b4\uc11c \uac00\uc911\uce58 \ud589\ub82c\ub85c \ub9cc\ub4ed\ub2c8\ub2e4.<\/li>\n\n\n\n<li>\uc774\uc81c \ub450 \uac70\ub300\ud55c \ud589\ub82c\uc744 \uacf1\ud569\ub2c8\ub2e4(GEMM).<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">\ud2b8\ub808\uc774\ub4dc\uc624\ud504 (Trade-off)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\uc7a5\uc810 (Speed):<\/strong> \ubcf5\uc7a1\ud55c Conv \uc5f0\uc0b0\uc774 \ucd5c\uc801\ud654\uac00 \ub9e4\uc6b0 \uc798 \ub418\uc5b4 \uc788\ub294 GEMM(General Matrix Multiply) \ubb38\uc81c\ub85c \ubc14\ub01d\ub2c8\ub2e4. GPU\ub098 NPU\uc758 \ud589\ub82c \uc5f0\uc0b0 \uc720\ub2db\uc744 100% \uac00\ub3d9\ud560 \uc218 \uc788\uc5b4 \uc18d\ub3c4\uac00 \ube44\uc57d\uc801\uc73c\ub85c \ube68\ub77c\uc9d1\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>\ub2e8\uc810 (Memory):<\/strong> <strong>\uce58\uba85\uc801\uc778 \uba54\ubaa8\ub9ac \ub0ad\ube44\uac00 \ubc1c\uc0dd\ud569\ub2c8\ub2e4.<\/strong> Sliding Window\uac00 \uacb9\uce58\ub294(Overlap) \uc601\uc5ed\uc758 \ud53d\uc140\ub4e4\uc774 \ud589\ub82c\ub85c \ubcc0\ud658\ub420 \ub54c \uc911\ubcf5 \ubcf5\uc0ac(Duplication)\ub429\ub2c8\ub2e4.\n<ul class=\"wp-block-list\">\n<li>\uc608\ub97c \ub4e4\uc5b4 3 * 3 \ucee4\ub110\uc744 \uc0ac\uc6a9\ud558\uba74, \uc6d0\ubcf8 \uc774\ubbf8\uc9c0\ubcf4\ub2e4 \ub370\uc774\ud130 \uc591\uc774 \uc57d 9\ubc30 \ubee5\ud280\uae30\ub429\ub2c8\ub2e4. (Stride=1 \uae30\uc900)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n\n\n\n<p>\ud558\ub4dc\uc6e8\uc5b4 \uc5d4\uc9c0\ub2c8\uc5b4\ub294 \uace0\ubbfc\ud569\ub2c8\ub2e4. &#8220;\uba54\ubaa8\ub9ac \uc0ac\uc6a9\ub7c9\uc774 9\ubc30 \ub298\uc5b4\ub098\ub3c4 \uad1c\ucc2e\uc740\uac00?&#8221; \ub300\ubd80\ubd84\uc758 \uacbd\uc6b0 &#8220;\uadf8\ub807\ub2e4&#8221;\uc785\ub2c8\ub2e4. \uc65c\ub0d0\ud558\uba74 \uba54\ubaa8\ub9ac \uc6a9\ub7c9\ubcf4\ub2e4 \uc5f0\uc0b0 \uc18d\ub3c4(Throughput)\uac00 \ub354 \uadc0\ud558\uae30 \ub54c\ubb38\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">3. Winograd Algorithm<\/h2>\n\n\n\n<p>Im2Col\uc774 \uad6c\uc870\ub97c \ubc14\uafd4 \uc18d\ub3c4\ub97c \ub192\uc600\ub2e4\uba74, Winograd\ub294 \uc218\ud559\uc744 \ubc14\uafd4 \uc5f0\uc0b0 \ud69f\uc218 \uc790\uccb4\ub97c \uc904\uc774\ub294 \ubc29\uc2dd\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\ubcf4\ud1b5 3 * 3 \ud544\ud130\ub85c 2 * 2 \ucd9c\ub825\uc744 \ub9cc\ub4e4\ub824\uba74 2 * 2 * 9 = 36\ubc88\uc758 \uacf1\uc148\uc774 \ud544\uc694\ud569\ub2c8\ub2e4. \ud558\uc9c0\ub9cc Winograd \uc54c\uace0\ub9ac\uc998\uc744 \uc4f0\uba74 \uc774\ub97c 16\ubc88\uc758 \uacf1\uc148\uc73c\ub85c \uc904\uc77c \uc218 \uc788\uc2b5\ub2c8\ub2e4. (\uc57d 2.25\ubc30 \uac10\uc18c)<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">\uc791\ub3d9 \uc6d0\ub9ac<\/h3>\n\n\n\n<p>\uc785\ub825(d)\uacfc \ud544\ud130(g)\ub97c \ud4e8\ub9ac\uc5d0 \ubcc0\ud658\uacfc \uc720\uc0ac\ud55c \ubc29\uc2dd\uc73c\ub85c \ubcc0\ud658(Transform)\ud55c \ub4a4, \uc810\ubcc4 \uacf1\uc148(Element-wise Multiplication)\uc744 \ud558\uace0, \ub2e4\uc2dc \uc5ed\ubcc0\ud658(Inverse Transform)\uc744 \ud569\ub2c8\ub2e4.<\/p>\n\n\n\n<div class=\"wp-block-math\"><math display=\"block\"><semantics><mrow><mi>Y<\/mi><mo>=<\/mo><msup><mi>A<\/mi><mi>T<\/mi><\/msup><mo form=\"prefix\" stretchy=\"false\">[<\/mo><mo form=\"prefix\" stretchy=\"false\">(<\/mo><mi>G<\/mi><mi>g<\/mi><msup><mi>G<\/mi><mi>T<\/mi><\/msup><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mo>\u2299<\/mo><mo form=\"prefix\" stretchy=\"false\">(<\/mo><msup><mi>B<\/mi><mi>T<\/mi><\/msup><mi>d<\/mi><mi>B<\/mi><mo form=\"postfix\" stretchy=\"false\">)<\/mo><mo form=\"postfix\" stretchy=\"false\">]<\/mo><mi>A<\/mi><\/mrow><annotation encoding=\"application\/x-tex\">Y = A^T [(G g G^T) \\odot (B^T d B)] A<\/annotation><\/semantics><\/math><\/div>\n\n\n\n<ul class=\"wp-block-list\">\n<li>G, B, A: \ubbf8\ub9ac \uc815\uc758\ub41c \ubcc0\ud658 \ud589\ub82c (\uc0c1\uc218)<\/li>\n\n\n\n<li><em>\u2299<\/em>: \uc810\ubcc4 \uacf1\uc148<\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">\ud2b8\ub808\uc774\ub4dc\uc624\ud504 (Trade-off)<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\uc7a5\uc810:<\/strong> \uacf1\uc148(MAC) \ud69f\uc218\uac00 \ud68d\uae30\uc801\uc73c\ub85c \uc904\uc5b4\ub4ed\ub2c8\ub2e4. 3 * 3 Conv \uc704\uc8fc\uc758 \ubaa8\ub378\uc5d0\uc11c \uc555\ub3c4\uc801\uc778 \uc131\ub2a5\uc744 \ubcf4\uc785\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>\ub2e8\uc810 1 (\ubcc0\ud658 \ube44\uc6a9):<\/strong> \uacf1\uc148\uc740 \uc904\uc9c0\ub9cc, \ubcc0\ud658 \uacfc\uc815\uc5d0\uc11c \ub367\uc148(Addition)\uc774 \ub298\uc5b4\ub0a9\ub2c8\ub2e4. \ub610\ud55c \ubcc0\ud658\uc744 \uc704\ud55c \uc804\uc6a9 \ud558\ub4dc\uc6e8\uc5b4 \uc720\ub2db\uc774 \ud544\uc694\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>\ub2e8\uc810 2 (\uc815\ubc00\ub3c4):<\/strong> \ubcc0\ud658 \ud589\ub82c\uc5d0 \uc18c\uc218\uc810 \uc0c1\uc218\uac00 \ub9ce\uc544, \ubd80\ub3d9\uc18c\uc218\uc810 \uc5f0\uc0b0 \uc2dc \uc624\ucc28(Numerical Error)\uac00 \ubc1c\uc0dd\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. INT8 \uc591\uc790\ud654\uc640 \uacb0\ud569\ud558\uae30 \uae4c\ub2e4\ub86d\uc2b5\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">4. \ud558\ub4dc\uc6e8\uc5b4 \uc5d4\uc9c0\ub2c8\uc5b4\uc758 \uc120\ud0dd<\/h2>\n\n\n\n<p>\uadf8\ub807\ub2e4\uba74 \uc6b0\ub9ac\ub294 \uc5b4\ub5a4 \ubc29\uc2dd\uc744 \uc120\ud0dd\ud574\uc57c \ud560\uae4c\uc694? &#8220;\uc815\ub2f5\uc740 \uc0c1\ud669\uc5d0 \ub530\ub77c \ub2e4\ub974\ub2e4&#8221;\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>\uba54\ubaa8\ub9ac\uac00 \ub109\ub109\ud558\uace0 \uace0\uc131\ub2a5\uc774 \ud544\uc694\ud55c GPU:<\/strong> \uc8fc\uc800 \uc5c6\uc774 <strong>Im2Col + GEMM<\/strong>\uc744 \uc0ac\uc6a9\ud569\ub2c8\ub2e4. \uba54\ubaa8\ub9ac \ub0ad\ube44\ubcf4\ub2e4 \ubcd1\ub82c \ucc98\ub9ac\uac00 \uc911\uc694\ud558\ub2c8\uae4c\uc694.<\/li>\n\n\n\n<li><strong>\ucee4\ub110 \uc0ac\uc774\uc988\uac00 \uc791\uc740(3 * 3) \ubaa8\ubc14\uc77c NPU:<\/strong> <strong>Winograd<\/strong>\ub97c \uc801\uadf9 \ud65c\uc6a9\ud558\uc5ec \uc5f0\uc0b0 \ud6a8\uc728\uc744 \ub192\uc785\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>\uba54\ubaa8\ub9ac\uac00 \uadf9\ub3c4\ub85c \uc81c\ud55c\ub41c IoT MCU:<\/strong> \uc5b4\uca54 \uc218 \uc5c6\uc774 <strong>Direct Conv<\/strong>\ub97c \uc4f0\uac70\ub098, Im2Col\uc744 \uc544\uc8fc \uc791\uc740 \ud0c0\uc77c(Tile) \ub2e8\uc704\ub85c \ucabc\uac1c\uc11c \uc218\ud589\ud569\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">5. \uacb0\ub860<\/h2>\n\n\n\n<p>CNN \uac00\uc18d\uc758 \uc5ed\uc0ac\ub294 \uba54\ubaa8\ub9ac \uacf5\uac04(Space)\uacfc \uc5f0\uc0b0 \uc2dc\uac04(Time) \uc0ac\uc774\uc758 \ub04a\uc784\uc5c6\ub294 \uc904\ub2e4\ub9ac\uae30\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Im2Col:<\/strong> \uacf5\uac04\uc744 \ubc84\ub9ac\uace0 \uc2dc\uac04\uc744 \uc0b0\ub2e4.<\/li>\n\n\n\n<li><strong>Winograd:<\/strong> \ubcf5\uc7a1\uc131(\ub367\uc148)\uc744 \uc8fc\uace0 \uacf1\uc148\uc744 \uc904\uc778\ub2e4.<\/li>\n<\/ul>\n\n\n\n<p>\ud558\ub4dc\uc6e8\uc5b4 \uc5d4\uc9c0\ub2c8\uc5b4\ub85c\uc11c \uc6b0\ub9ac\ub294 \uce69\uc774 \uc0ac\uc6a9\ub420 \ud658\uacbd(\uba54\ubaa8\ub9ac \ub300\uc5ed\ud3ed, \uc6a9\ub7c9, \ubaa9\ud45c \uc131\ub2a5)\uc744 \uc815\ud655\ud788 \ud30c\uc545\ud558\uace0, \uc774 \uc54c\uace0\ub9ac\uc998 \uc911 \ucd5c\uc801\uc758 \ubb34\uae30\ub97c \uc120\ud0dd\ud574 \ud558\ub4dc\uc6e8\uc5b4\uc5d0 \ub9e4\ud551\ud574\uc57c \ud569\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\ub2e4\uc74c \uae00\uc5d0\uc11c\ub294 CNN\uc758 \ub610 \ub2e4\ub978 \ud544\uc218 \uc694\uc18c\uc774\uc790, \uac04\ub2e8\ud574 \ubcf4\uc774\uc9c0\ub9cc \ud558\ub4dc\uc6e8\uc5b4 \ub77c\uc778 \ubc84\ud37c(Line Buffer) \uc124\uacc4\ub97c \ubcf5\uc7a1\ud558\uac8c \ub9cc\ub4dc\ub294 \uc8fc\ubc94, &#8220;Pooling\uacfc Padding\uc758 \ud558\ub4dc\uc6e8\uc5b4 \uc774\uc288&#8221;\uc5d0 \ub300\ud574 \uc54c\uc544\ubcf4\uaca0\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n<style>.kadence-column1312_3b2948-06 > .kt-inside-inner-col{box-shadow:0px 0px 14px 0px rgba(0, 0, 0, 0.2);}.kadence-column1312_3b2948-06 > .kt-inside-inner-col,.kadence-column1312_3b2948-06 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column1312_3b2948-06 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column1312_3b2948-06 > .kt-inside-inner-col{flex-direction:column;}.kadence-column1312_3b2948-06 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column1312_3b2948-06 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column1312_3b2948-06{position:relative;}@media all and (max-width: 1024px){.kadence-column1312_3b2948-06 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column1312_3b2948-06 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column1312_3b2948-06\"><div class=\"kt-inside-inner-col\">\n<p><strong>\uad00\ub828 \uae00<\/strong><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-1-neuron-hardware-mac-analysis\/\" data-type=\"post\" data-id=\"1248\">AI Architecture 1. \uc778\uacf5 \ub274\ub7f0\uc758 \ud574\ubd80: silicon\uc5d0\uc11c Y=WX+B \uad6c\ud604<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-2-activation-relu-vs-sigmoid\/\" data-type=\"post\" data-id=\"1255\">AI Architecture 2. \ud65c\uc131\ud654 \ud568\uc218\uc758 \ube44\uc6a9: ReLU vs Sigmoid<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-3-matmul-simd-parallel-processing\/\" data-type=\"post\" data-id=\"1263\">AI Architecture 3. \ud589\ub82c\uacf1(MatMul)\uc758 \ubbf8\ud559: \ub525\ub7ec\ub2dd\uc774 GPU\/NPU\ub97c \uc120\ud0dd\ud55c \uc774\uc720<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-4-training-vs-inference\/\" data-type=\"post\" data-id=\"1267\">AI Architecture 4. \ud559\uc2b5(Training) vs \ucd94\ub860(Inference)<\/a><\/p>\n<\/div><\/div>\n\n\n\n<p>\ucc38\uace0: <em><a href=\"https:\/\/arxiv.org\/abs\/1509.09308\" target=\"_blank\" rel=\"noopener\">Fast Algorithms for Convolutional Neural Networks<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous post, we learned that hardware loves CNNs (Convolutional Neural Networks) because of Locality and Data Reuse. Theoretically, CNNs seem like the perfect hardware-friendly algorithm.<\/p>","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[116],"tags":[117,118],"class_list":["post-1312","post","type-post","status-publish","format-standard","hentry","category-ai-and-hw-fundamentals","tag-ai","tag-architecture"],"_links":{"self":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts\/1312","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/comments?post=1312"}],"version-history":[{"count":4,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts\/1312\/revisions"}],"predecessor-version":[{"id":1323,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts\/1312\/revisions\/1323"}],"wp:attachment":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/media?parent=1312"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/categories?post=1312"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/tags?post=1312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69b92da9d36f73cd2808d6e8. Config Timestamp: 2026-03-17 10:32:09 UTC, Cached Timestamp: 2026-05-16 08:48:19 UTC -->