{"id":1364,"date":"2026-01-23T10:20:27","date_gmt":"2026-01-23T01:20:27","guid":{"rendered":"https:\/\/rtlearner.com\/?p=1364"},"modified":"2026-02-27T10:08:10","modified_gmt":"2026-02-27T01:08:10","slug":"ai-architecture-15-systolic-array-architecture","status":"publish","type":"post","link":"https:\/\/rtlearner.com\/en\/ai-architecture-15-systolic-array-architecture\/","title":{"rendered":"AI Architecture 15. The Heart of Systolic Array"},"content":{"rendered":"\n<p>\ud604\ub300 \ucef4\ud4e8\ud305\uc758 \uc2dc\ucd08\uc778 \ud3f0 \ub178\uc774\ub9cc \uad6c\uc870\ub294 \uc5f0\uc0b0 \uc7a5\uce58(CPU)\uc640 \uc800\uc7a5 \uc7a5\uce58(Memory)\uac00 \ubd84\ub9ac\ub418\uc5b4 \uc788\uc2b5\ub2c8\ub2e4. \uc774 \uad6c\uc870\uc758 \uce58\uba85\uc801\uc778 \ub2e8\uc810\uc740 \uc5f0\uc0b0\uc744 \ud560 \ub54c\ub9c8\ub2e4 \ub370\uc774\ud130\ub97c \uba54\ubaa8\ub9ac\uc5d0\uc11c \uac00\uc838\uc624\uace0 \ub2e4\uc2dc \uc800\uc7a5\ud574\uc57c \ud55c\ub2e4\ub294 \uc810\uc785\ub2c8\ub2e4. \uc55e\uc11c 13\ubc88 \uae00(Roofline Model)\uacfc 14\ubc88 \uae00(Dataflow)\uc5d0\uc11c \uc9c0\uc801\ud588\ub4ef, \uc774 \ub370\uc774\ud130 \uc774\ub3d9 \ube44\uc6a9\uc774 \ub525\ub7ec\ub2dd \uac00\uc18d\uae30\uc758 \uc131\ub2a5\uc744 \uc81c\ud55c\ud558\ub294 \uac00\uc7a5 \ud070 \uc7a5\ubcbd\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<p>1978\ub144, H.T. Kung\uacfc C.E. Leiserson\uc740 \uc774 \ubb38\uc81c\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574 &#8216;Systolic Array&#8217;\ub77c\ub294 \uac1c\ub150\uc744 \uc81c\uc548\ud588\uc2b5\ub2c8\ub2e4. &#8216;Systolic&#8217;\uc740 \uc2ec\uc7a5\uc774 \ud608\uc561\uc744 \ud38c\ud504\uc9c8\ud558\uc5ec \uc628\ubab8\uc73c\ub85c \uc21c\ud658\uc2dc\ud0a4\ub294 \uc218\ucd95\uae30(Systole)\uc5d0\uc11c \uc720\ub798\ud55c \uc6a9\uc5b4\ub85c, \ub370\uc774\ud130\uac00 \uba54\ubaa8\ub9ac\uc5d0\uc11c \ud55c \ubc88 \ud37c\uc62c\ub824\uc9c0\uba74(Pumped), \uc218\ub9ce\uc740 \uc5f0\uc0b0\uae30(PE)\ub4e4\uc758 \ubc30\uc5f4 \uc0ac\uc774\ub97c \uaddc\uce59\uc801\uc778 \ub9ac\ub4ec\uc5d0 \ub9de\ucdb0 \ud1b5\uacfc\ud558\uba70 \uc7ac\uc0ac\uc6a9\ub418\ub294 \uad6c\uc870\ub97c \uc758\ubbf8\ud569\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\uc774 \uae30\uc220\uc740 2017\ub144 \uad6c\uae00\uc774 \uc790\uc0ac\uc758 AI \uc804\uc6a9 \ud504\ub85c\uc138\uc11c\uc778 TPU(Tensor Processing Unit)\uc758 \ud575\uc2ec \uc544\ud0a4\ud14d\ucc98\ub85c \ucc44\ud0dd\ud558\uba74\uc11c, \ub525\ub7ec\ub2dd \ud558\ub4dc\uc6e8\uc5b4\uc758 \ud45c\uc900\uc73c\ub85c \uc790\ub9ac \uc7a1\uc558\uc2b5\ub2c8\ub2e4. \uc774\ubc88 \uae00\uc5d0\uc11c\ub294 Systolic Array\uc758 \uc791\ub3d9 \uc6d0\ub9ac\uc640 \uad6c\uae00 TPU\uac00 \uc774\ub97c \ud1b5\ud574 \uc5b4\ub5bb\uac8c \ud589\ub82c \uc5f0\uc0b0(GEMM) \ud6a8\uc728\uc744 \uadf9\ub300\ud654\ud588\ub294\uc9c0 \ubd84\uc11d\ud569\ub2c8\ub2e4.<\/p>\n\n\n<style>.kadence-column1364_18e523-24 > .kt-inside-inner-col{box-shadow:0px 0px 14px 0px rgba(0, 0, 0, 0.2);}.kadence-column1364_18e523-24 > .kt-inside-inner-col,.kadence-column1364_18e523-24 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column1364_18e523-24 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column1364_18e523-24 > .kt-inside-inner-col{flex-direction:column;}.kadence-column1364_18e523-24 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column1364_18e523-24 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column1364_18e523-24{position:relative;}@media all and (max-width: 1024px){.kadence-column1364_18e523-24 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column1364_18e523-24 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column1364_18e523-24\"><div class=\"kt-inside-inner-col\">\n<p><strong>\uad00\ub828 \uae00<\/strong><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-14-dataflow-taxonomy-ws-os-rs\/\" data-type=\"post\" data-id=\"1352\">AI Architecture 14. Dataflow Taxonomy: TPU vs Output Stationary vs Row Stationary<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-16-npu-optimization-memory-hierarchy\/\" data-type=\"post\" data-id=\"1383\">AI Architecture 16. \uba54\ubaa8\ub9ac \uacc4\uce35 \uad6c\uc870(Memory Hierarchy): \ub370\uc774\ud130 \uc774\ub3d9 \ube44\uc6a9 \ucd5c\uc18c\ud654 \uc804\ub7b5<\/a><\/p>\n<\/div><\/div>\n\n\n\n<h2 class=\"wp-block-heading\">1. \uad6c\uc870\uc801 \ud2b9\uc9d5: PE\uc758 2\ucc28\uc6d0 \uaca9\uc790 \ubc30\uc5f4 (Mesh Topology)<\/h2>\n\n\n\n<p>\uc77c\ubc18\uc801\uc778 CPU\ub098 GPU\uac00 \uc218\ucc9c \uac1c\uc758 \uc2a4\ub808\ub4dc\ub97c \uad00\ub9ac\ud558\uba70 \uce90\uc2dc(Cache)\uc640 \ub808\uc9c0\uc2a4\ud130\ub97c \ubcf5\uc7a1\ud558\uac8c \uc624\uac00\ub294 \uac83\uacfc \ub2ec\ub9ac, Systolic Array\ub294 \ub9e4\uc6b0 \ub2e8\uc21c\ud558\uace0 \uaddc\uce59\uc801\uc778 <strong>2\ucc28\uc6d0 \uaca9\uc790(Mesh) \uad6c\uc870<\/strong>\ub97c \uac00\uc9d1\ub2c8\ub2e4.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Processing Element (PE):<\/strong> \ubc30\uc5f4\uc758 \uac01 \uce78\uc744 \ucc28\uc9c0\ud558\ub294 \uc5f0\uc0b0 \ub2e8\uc704\uc785\ub2c8\ub2e4. \uac01 PE\ub294 \ub9e4\uc6b0 \ub2e8\uc21c\ud55c \uad6c\uc870\ub85c, \uacf1\uc148-\ub204\uc801 \uc5f0\uc0b0\uae30(MAC Unit)\uc640 \uc801\uc740 \uc6a9\ub7c9\uc758 \ub808\uc9c0\uc2a4\ud130\ub9cc\uc744 \ud3ec\ud568\ud569\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>Local Interconnect (\uad6d\uc18c \uc5f0\uacb0):<\/strong> \uac01 PE\ub294 \uc624\uc9c1 \uc790\uc2e0\uc758 <strong>\uc0c1\ud558\uc88c\uc6b0 \uc774\uc6c3\ud55c PE\uc640\ub9cc \uc5f0\uacb0<\/strong>\ub418\uc5b4 \uc788\uc2b5\ub2c8\ub2e4. \uc804\uc5ed \ub370\uc774\ud130 \ubc84\uc2a4(Global Bus)\uc5d0 \uc5f0\uacb0\ub418\uc9c0 \uc54a\uc73c\ubbc0\ub85c, \ub370\uc774\ud130\ub97c \uba40\ub9ac \ubcf4\ub0bc \ud544\uc694\uac00 \uc5c6\uc5b4 \ubc30\uc120 \ubcf5\uc7a1\ub3c4\uc640 \uc804\ub825 \uc18c\ubaa8\uac00 \ud68d\uae30\uc801\uc73c\ub85c \uc904\uc5b4\ub4ed\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>No Global Control:<\/strong> \uac1c\ubcc4 PE\ub294 \ubcf5\uc7a1\ud55c \uba85\ub839\uc5b4\ub97c \ud574\uc11d\ud558\uc9c0 \uc54a\uc2b5\ub2c8\ub2e4. \uc911\uc559 \ucee8\ud2b8\ub864\ub7ec\uac00 \ubcf4\ub0b4\ub294 \ub2e8\uc77c \ud074\ub7ed \uc2e0\ud638\uc5d0 \ub9de\ucdb0, \ub4e4\uc5b4\uc628 \ub370\uc774\ud130\ub97c \ucc98\ub9ac\ud558\uace0 \uc606\uc73c\ub85c \ub118\uae30\ub294 \ub2e8\uc21c \ubc18\ubcf5 \uc791\uc5c5\ub9cc \uc218\ud589\ud569\ub2c8\ub2e4.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">2. \uc791\ub3d9 \uba54\ucee4\ub2c8\uc998: Weight Stationary \uae30\ubc18\uc758 \ub370\uc774\ud130 \ud750\ub984<\/h2>\n\n\n\n<p>\uad6c\uae00 TPU v1\uc740 <strong>Weight Stationary (WS)<\/strong> \ub370\uc774\ud130\ud50c\ub85c\uc6b0\ub97c Systolic Array\ub85c \uad6c\ud604\ud588\uc2b5\ub2c8\ub2e4. N * N \ud589\ub82c \uacf1\uc148(C = A * B)\uc744 \uc608\ub85c \ub4e4\uc5b4 \uc791\ub3d9 \uc6d0\ub9ac\ub97c \ub2e8\uacc4\ubcc4\ub85c \ubd84\uc11d\ud574 \ubd05\uc2dc\ub2e4.<\/p>\n\n\n<style>.kb-image1364_482dd6-96.kb-image-is-ratio-size, .kb-image1364_482dd6-96 .kb-image-is-ratio-size{max-width:550px;width:100%;}.wp-block-kadence-column > .kt-inside-inner-col > .kb-image1364_482dd6-96.kb-image-is-ratio-size, .wp-block-kadence-column > .kt-inside-inner-col > .kb-image1364_482dd6-96 .kb-image-is-ratio-size{align-self:unset;}.kb-image1364_482dd6-96 figure{max-width:550px;}.kb-image1364_482dd6-96 .image-is-svg, .kb-image1364_482dd6-96 .image-is-svg img{width:100%;}.kb-image1364_482dd6-96 .kb-image-has-overlay:after{opacity:0.3;}<\/style>\n<div class=\"wp-block-kadence-image kb-image1364_482dd6-96\"><figure class=\"aligncenter size-full\"><img data-dominant-color=\"f2ede0\" data-has-transparency=\"false\" style=\"--dominant-color: #f2ede0;\" loading=\"lazy\" decoding=\"async\" width=\"647\" height=\"477\" src=\"https:\/\/rtlearner.com\/wp-content\/uploads\/2026\/01\/image-3-9.jpg\" alt=\"Systolic data flow\" class=\"kb-img wp-image-1372 not-transparent\" srcset=\"https:\/\/rtlearner.com\/wp-content\/uploads\/2026\/01\/image-3-9.jpg 647w, https:\/\/rtlearner.com\/wp-content\/uploads\/2026\/01\/image-3-9-300x221.jpg 300w, https:\/\/rtlearner.com\/wp-content\/uploads\/2026\/01\/image-3-9-16x12.jpg 16w\" sizes=\"auto, (max-width: 647px) 100vw, 647px\" \/><figcaption>Systolic data flow of the Matrix Multiply Unit<\/figcaption><\/figure><\/div>\n\n\n\n<ol start=\"1\" class=\"wp-block-list\">\n<li>Weight Pre-loading (\uac00\uc911\uce58 \uc801\uc7ac):\uc5f0\uc0b0\uc744 \uc2dc\uc791\ud558\uae30 \uc804, \ud589\ub82c B(\uac00\uc911\uce58)\uc758 \uac12\ub4e4\uc774 \uac01 PE\uc5d0 \ubbf8\ub9ac Load\ub429\ub2c8\ub2e4. \uc774 \uac12\ub4e4\uc740 \uc5f0\uc0b0\uc774 \ub05d\ub0a0 \ub54c\uae4c\uc9c0 \ud574\ub2f9 PE\uc758 \ub808\uc9c0\uc2a4\ud130\uc5d0 \uace0\uc815(Stationary)\ub429\ub2c8\ub2e4.<\/li>\n\n\n\n<li>Input Streaming (\uc785\ub825 \ub370\uc774\ud130 \uc8fc\uc785):\ud589\ub82c A(\uc785\ub825 \ub370\uc774\ud130)\uc758 \uac12\ub4e4\uc774 \ubc30\uc5f4\uc758 \uc67c\ucabd\uc5d0\uc11c \uc624\ub978\ucabd\uc73c\ub85c \ud750\ub985\ub2c8\ub2e4. \uc774\ub54c \uc911\uc694\ud55c \uac83\uc740 Skewing\uc785\ub2c8\ub2e4. \ubaa8\ub4e0 \ud589\uc758 \ub370\uc774\ud130\uac00 \ub3d9\uc2dc\uc5d0 \ub4e4\uc5b4\uac00\ub294 \uac83\uc774 \uc544\ub2c8\ub77c, \uccab \ubc88\uc9f8 \ud589\uc774 T=0\uc5d0, \ub450 \ubc88\uc9f8 \ud589\uc774 T=1\uc5d0 \ub4e4\uc5b4\uac00\ub294 \uc2dd\uc73c\ub85c \ud55c \uc0ac\uc774\ud074\uc529 \uc9c0\uc5f0\ub418\uc5b4 \ub4e4\uc5b4\uac11\ub2c8\ub2e4. \uc774\ub294 \ub300\uac01\uc120 \ud30c\ub3d9(Wavefront) \ud615\ud0dc\uc758 \ub370\uc774\ud130 \ud750\ub984\uc744 \ub9cc\ub4e4\uc5b4\ub0c5\ub2c8\ub2e4.<\/li>\n\n\n\n<li><strong>Systolic Flow &amp; Accumulation:<\/strong>\n<ul class=\"wp-block-list\">\n<li>\uac01 PE\ub294 \uc67c\ucabd\uc5d0\uc11c \ubc1b\uc740 \uc785\ub825\uac12(A<sub>row<\/sub>)\uacfc \uc790\uc2e0\uc774 \uac00\uc9c0\uace0 \uc788\ub294 \uac00\uc911\uce58(B<sub>fixed<\/sub>)\ub97c \uacf1\ud569\ub2c8\ub2e4.<\/li>\n\n\n\n<li>\uadf8 \uacb0\uacfc(\ubd80\ubd84\ud569)\ub97c \uc704\ucabd\uc5d0\uc11c \ub0b4\ub824\uc628 \uac12\uacfc \ub354\ud569\ub2c8\ub2e4.<\/li>\n\n\n\n<li>\uc785\ub825\uac12(A)\uc740 \uc624\ub978\ucabd PE\ub85c \uc804\ub2ec\ud558\uace0, \uac31\uc2e0\ub41c \ubd80\ubd84\ud569(C)\uc740 \uc544\ub798\ucabd PE\ub85c \uc804\ub2ec\ud569\ub2c8\ub2e4.<\/li>\n\n\n\n<li>\uc774 \uacfc\uc815\uc774 \ub9e4 \ud074\ub7ed \uc0ac\uc774\ud074\ub9c8\ub2e4 \uc2ec\uc7a5 \ubc15\ub3d9\ucc98\ub7fc \uaddc\uce59\uc801\uc73c\ub85c \uc77c\uc5b4\ub0a9\ub2c8\ub2e4.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li>Output Draining (\uacb0\uacfc \ubc30\ucd9c):\ubc30\uc5f4\uc758 \uac00\uc7a5 \uc544\ub798\ucabd\uc73c\ub85c \uc644\uc131\ub41c \uacb0\uacfc\uac12(C)\ub4e4\uc774 \uc21c\ucc28\uc801\uc73c\ub85c \ubc00\ub824 \ub098\uc635\ub2c8\ub2e4.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">3. Systolic Array\uc758 \ud575\uc2ec \uc7a5\uc810<\/h2>\n\n\n\n<p>\uc65c \uad6c\uae00\uc740 \uc218\ub9ce\uc740 \uc544\ud0a4\ud14d\ucc98 \uc911 Systolic Array\ub97c \uc120\ud0dd\ud588\uc744\uae4c\uc694? \uadf8 \uc774\uc720\ub294 \uc5f0\uc0b0 \ubc00\ub3c4(Density)\uc640 \ub300\uc5ed\ud3ed \ud6a8\uc728(Bandwidth Efficiency)\uc5d0 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A. O(N<sup>2<\/sup>)\uc758 \uc5f0\uc0b0\uc744 O(N)\uc758 \ub300\uc5ed\ud3ed\uc73c\ub85c \ucc98\ub9ac<\/h3>\n\n\n\n<p>Systolic Array\uc758 \uac00\uc7a5 \uac15\ub825\ud55c \ud2b9\uc9d5\uc785\ub2c8\ub2e4. N * N \ud06c\uae30\uc758 PE \ubc30\uc5f4\uc5d0\ub294 N<sup>2<\/sup>\uac1c\uc758 \uc5f0\uc0b0\uae30\uac00 \uc788\uc2b5\ub2c8\ub2e4. \ud558\uc9c0\ub9cc \uc678\ubd80 \uba54\ubaa8\ub9ac\uc640 \uc5f0\uacb0\ub41c \uc785\ucd9c\ub825 \ud3ec\ud2b8\ub294 \ubc30\uc5f4\uc758 \ud14c\ub450\ub9ac\uc5d0 \uc788\ub294 N\uac1c\ubfd0\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\uc989, \ud55c \ubc88 \ub370\uc774\ud130\ub97c \uba54\ubaa8\ub9ac\uc5d0\uc11c \uc77d\uc5b4\uc624\uba74(N), \ub0b4\ubd80\uc801\uc73c\ub85c N\ubc88\uc758 \uc7ac\uc0ac\uc6a9\uc774 \uc77c\uc5b4\ub098 \ucd1d N<sup>2<\/sup>\ubc88\uc758 \uc5f0\uc0b0\uc744 \uc218\ud589\ud569\ub2c8\ub2e4. \uc774\ub294 \uba54\ubaa8\ub9ac \ub300\uc5ed\ud3ed \ubcd1\ubaa9\uc744 \ud574\uacb0\ud558\uace0 Roofline \ubaa8\ub378\uc758 Compute-bound \uc601\uc5ed\uc73c\ub85c \uc2dc\uc2a4\ud15c\uc744 \ubc00\uc5b4 \ub123\ub294 \uac00\uc7a5 \ud6a8\uacfc\uc801\uc778 \ubc29\ubc95\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">B. \ub192\uc740 \uba74\uc801 \ud6a8\uc728\uc131 (High Area Efficiency)<\/h3>\n\n\n\n<p>\uac01 PE\ub294 \uc81c\uc5b4 \uc720\ub2db(Control Unit)\uc774\ub098 \ubcf5\uc7a1\ud55c \uce90\uc2dc \uacc4\uce35 \uc5c6\uc774 \uc21c\uc218\ud558\uac8c MAC \uc5f0\uc0b0\uae30 \uc704\uc8fc\ub85c \uad6c\uc131\ub429\ub2c8\ub2e4. \ub530\ub77c\uc11c \ub3d9\uc77c\ud55c \uc2e4\ub9ac\ucf58 \uba74\uc801 \ub300\ube44 \ud6e8\uc52c \ub354 \ub9ce\uc740 \uc218\uc758 \uc5f0\uc0b0\uae30\ub97c \uc9d1\uc801\ud560 \uc218 \uc788\uc2b5\ub2c8\ub2e4. TPU v1\uc758 \uacbd\uc6b0, 700MHz\uc758 \ub0ae\uc740 \ud074\ub7ed\uc73c\ub85c\ub3c4 \uac70\ub300\ud55c 256 * 256 \ubc30\uc5f4\uc744 \ud1b5\ud574 \ub9c9\ub300\ud55c TOPS(Tera Operations Per Second)\ub97c \ub2ec\uc131\ud588\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">4. \ub2e8\uc810 \ubc0f \ud55c\uacc4: \uc720\uc5f0\uc131(Flexibility)\uc758 \ubd80\uc7ac<\/h2>\n\n\n\n<p>\ubaa8\ub4e0 \uc124\uacc4\uc5d0\ub294 \ud2b8\ub808\uc774\ub4dc\uc624\ud504(Trade-off)\uac00 \uc874\uc7ac\ud569\ub2c8\ub2e4. Systolic Array\ub294 \ud589\ub82c \uc5f0\uc0b0(Matrix Multiplication)\uc5d0\ub294 \uadf9\uac15\uc758 \ud6a8\uc728\uc744 \ubcf4\uc774\uc9c0\ub9cc, \uadf8 \uc678\uc758 \uc791\uc5c5\uc5d0\ub294 \ucde8\uc57d\ud569\ub2c8\ub2e4.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">A. \ub808\uc774\ud134\uc2dc\uc640 \ud30c\uc774\ud504\ub77c\uc778 \ucc44\uc6b0\uae30 (Fill &amp; Drain Overhead)<\/h3>\n\n\n\n<p>\ubc30\uc5f4\uc774 \uaf49 \ucc28\uc11c \ubaa8\ub4e0 PE\uac00 \uac00\ub3d9\ub418\uae30\uae4c\uc9c0(Fill), \uadf8\ub9ac\uace0 \ub9c8\uc9c0\ub9c9 \ub370\uc774\ud130\uac00 \ube60\uc838\ub098\uc624\uae30\uae4c\uc9c0(Drain) \uc2dc\uac04\uc774 \uac78\ub9bd\ub2c8\ub2e4. \uc774\ub97c <strong>Pipeline Priming<\/strong> \uc2dc\uac04\uc774\ub77c\uace0 \ud569\ub2c8\ub2e4. \ub530\ub77c\uc11c \ucc98\ub9ac\ud574\uc57c \ud560 \ub370\uc774\ud130(Batch Size)\uac00 \uc791\uc73c\uba74, \ud30c\uc774\ud504\ub77c\uc778\uc744 \ucc44\uc6b0\uae30\ub3c4 \uc804\uc5d0 \uc5f0\uc0b0\uc774 \ub05d\ub098\ubc84\ub824 PE \uac00\ub3d9\ub960(Utilization)\uc774 \uae09\uaca9\ud788 \ub5a8\uc5b4\uc9d1\ub2c8\ub2e4. \uc774\ub294 TPU\uac00 &#8216;Large Batch&#8217; \ucd94\ub860\uc774\ub098 \ud559\uc2b5\uc5d0 \uc720\ub9ac\ud558\uace0, \uc2e4\uc2dc\uac04 &#8216;Batch 1&#8217; \ucd94\ub860\uc5d0\ub294 \ubd88\ub9ac\ud55c \uc774\uc720\uc785\ub2c8\ub2e4.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">B. \ud76c\uc18c\uc131(Sparsity) \ucc98\ub9ac\uc758 \uc5b4\ub824\uc6c0<\/h3>\n\n\n\n<p>Systolic Array\ub294 \ub370\uc774\ud130\uac00 \uaf49 \ucc28 \uc788\ub294 \ubc00\uc9d1 \ud589\ub82c(Dense Matrix)\uc744 \uac00\uc815\ud569\ub2c8\ub2e4. \ub9cc\uc57d \ub370\uc774\ud130\uc5d0 0\uc774 \ub9ce\uc740 \ud76c\uc18c \ud589\ub82c(Sparse Matrix)\uc774\ub77c\ub3c4, \uaddc\uce59\uc801\uc778 \ud750\ub984(Rhythm)\uc744 \uae68\ub728\ub9b4 \uc218 \uc5c6\uae30 \ub54c\ubb38\uc5d0 0\uc744 \uacf1\ud558\ub294 \ubb34\uc758\ubbf8\ud55c \uc5f0\uc0b0\uc744 \uc218\ud589\ud574\uc57c \ud569\ub2c8\ub2e4. \uc774\ub97c \ud574\uacb0\ud558\uae30 \uc704\ud574 \ucd5c\uadfc\uc5d0\ub294 \uad6c\uc870\uc801 \ud76c\uc18c\uc131(Structured Sparsity)\uc744 \uc9c0\uc6d0\ud558\ub294 \ud558\ub4dc\uc6e8\uc5b4 \uc5f0\uad6c\uac00 \uc9c4\ud589\ub418\uace0 \uc788\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">[Conclusion]<\/h2>\n\n\n\n<p>Systolic Array\ub294 \ub370\uc774\ud130 \uc774\ub3d9 \ube44\uc6a9 \ucd5c\uc18c\ud654\ub77c\ub294 NPU \uc124\uacc4\uc758 \ub300\uc6d0\uce59\uc744 \ud558\ub4dc\uc6e8\uc5b4 \ub808\ubca8\uc5d0\uc11c \uac00\uc7a5 \uc6b0\uc544\ud558\uac8c \uad6c\ud604\ud55c \uad6c\uc870\uc785\ub2c8\ub2e4. \uad6d\uc18c\uc801\uc778 \uc5f0\uacb0(Local Connection)\ub9cc\uc73c\ub85c \uac70\ub300\ud55c \ubcd1\ub82c \uc5f0\uc0b0\uc744 \uc218\ud589\ud558\ub294 \uc774 \ubc29\uc2dd\uc740 \uad6c\uae00 TPU\uc758 \uc131\uacf5 \uc774\ud6c4 \ud604\ub300 AI \ubc18\ub3c4\uccb4\uc758 \ud45c\uc900 \uad50\uacfc\uc11c\uac00 \ub418\uc5c8\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n\n<p>\ud558\uc9c0\ub9cc \uc218\ub9cc \uac1c\uc758 PE\uac00 \uc274 \uc0c8 \uc5c6\uc774 \ub3cc\uc544\uac00\ub824\uba74, \uadf8\ub9cc\ud07c \ub370\uc774\ud130\ub97c \ub04a\uae40 \uc5c6\uc774 \uacf5\uae09\ud574 \uc904 \uac15\ub825\ud55c \uba54\ubaa8\ub9ac \uc2dc\uc2a4\ud15c\uc774 \ub4b7\ubc1b\uce68\ub418\uc5b4\uc57c \ud569\ub2c8\ub2e4. \ub2e4\uc74c \uae00\uc5d0\uc11c\ub294 \uc774 Systolic Array\uc5d0 \ub370\uc774\ud130\ub97c \uba39\uc5ec \uc0b4\ub9ac\uae30 \uc704\ud55c &#8220;\uba54\ubaa8\ub9ac \uacc4\uce35 \uad6c\uc870(Memory Hierarchy): DRAM &#8211; Global Buffer &#8211; PE Register&#8221;\uc758 \ub370\uc774\ud130 \uc774\ub3d9 \uc804\ub7b5\uc5d0 \ub300\ud574 \uc54c\uc544\ubcf4\uaca0\uc2b5\ub2c8\ub2e4.<\/p>\n\n\n<style>.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col{box-shadow:0px 0px 14px 0px rgba(0, 0, 0, 0.2);}.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col,.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col:before{border-top-left-radius:0px;border-top-right-radius:0px;border-bottom-right-radius:0px;border-bottom-left-radius:0px;}.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col{column-gap:var(--global-kb-gap-sm, 1rem);}.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col{flex-direction:column;}.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col > .aligncenter{width:100%;}.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col:before{opacity:0.3;}.kadence-column1364_2f1bd1-62{position:relative;}@media all and (max-width: 1024px){.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}@media all and (max-width: 767px){.kadence-column1364_2f1bd1-62 > .kt-inside-inner-col{flex-direction:column;justify-content:center;}}<\/style>\n<div class=\"wp-block-kadence-column kadence-column1364_2f1bd1-62\"><div class=\"kt-inside-inner-col\">\n<p><strong>\uad00\ub828 \uae00<\/strong><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-14-dataflow-taxonomy-ws-os-rs\/\" data-type=\"post\" data-id=\"1352\">AI Architecture 14. Dataflow Taxonomy: TPU vs Output Stationary vs Row Stationary<\/a><\/p>\n\n\n\n<p>\u2705<a href=\"https:\/\/rtlearner.com\/ai-architecture-16-npu-optimization-memory-hierarchy\/\" data-type=\"post\" data-id=\"1383\">AI Architecture 16. \uba54\ubaa8\ub9ac \uacc4\uce35 \uad6c\uc870(Memory Hierarchy): \ub370\uc774\ud130 \uc774\ub3d9 \ube44\uc6a9 \ucd5c\uc18c\ud654 \uc804\ub7b5<\/a><\/p>\n<\/div><\/div>\n\n\n\n<p>\ucc38\uace0: <em><a href=\"https:\/\/arxiv.org\/abs\/1704.04760\" target=\"_blank\" rel=\"noopener\">In-Datacenter Performance Analysis of a Tensor Processing Unit<\/a><\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Von Neumann architecture, the origin of modern computing, separates the 'Processing Unit (CPU)' from the 'Storage Unit (Memory)'.<\/p>","protected":false},"author":1,"featured_media":1372,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[116],"tags":[117,118],"class_list":["post-1364","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-and-hw-fundamentals","tag-ai","tag-architecture"],"_links":{"self":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts\/1364","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/comments?post=1364"}],"version-history":[{"count":5,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts\/1364\/revisions"}],"predecessor-version":[{"id":1418,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/posts\/1364\/revisions\/1418"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/media\/1372"}],"wp:attachment":[{"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/media?parent=1364"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/categories?post=1364"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rtlearner.com\/en\/wp-json\/wp\/v2\/tags?post=1364"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}<!-- This website is optimized by Airlift. Learn more: https://airlift.net. Template:. Learn more: https://airlift.net. Template: 69b92da9d36f73cd2808d6e8. Config Timestamp: 2026-03-17 10:32:09 UTC, Cached Timestamp: 2026-05-22 20:41:40 UTC -->