This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| en:multiasm:cs:chapter_3_9 [2026/01/10 10:56] – pczekalski | en:multiasm:cs:chapter_3_9 [2026/01/10 20:16] (current) – pczekalski | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| ====== Modern Processors: Cache, Pipeline, Superscalar, | ====== Modern Processors: Cache, Pipeline, Superscalar, | ||
| - | Modern processors have a very complex design and include many units that primarily reduce | + | Modern processors have a highly |
| ===== Cache ===== | ===== Cache ===== | ||
| - | Cache memory is a layer in the memory hierarchy that sits between main memory and processor registers. The main reason for introducing cache memory is that main memory, based on DRAM technology, is much slower than the processor, which is based on static technology. The cache exploits two software features: spatial locality and temporal locality. Spatial locality results from the fact that the processor executes code, which, in most cases, is a sequence of instructions arranged directly one after another. Temporal locality arises because programs often run in loops, repeatedly working on a single set of data over short intervals. In both cases, a larger fragment of a program or data can be loaded into the cache and operated on without accessing main memory each time. Main memory is designed to significantly speed up reading and writing data in blocks compared to accessing random addresses. These properties allow a code fragment to be read in its entirety from main memory into the cache and executed without | + | Cache memory is a layer in the memory hierarchy that sits between main memory and processor registers. The main reason for introducing cache memory is that main memory, based on DRAM technology, is much slower than the processor, which is based on static technology. The cache exploits two software features: spatial locality and temporal locality. Spatial locality results from the fact that the processor executes code, which, in most cases, is a sequence of instructions arranged directly one after another. Temporal locality arises because programs often run in loops, repeatedly working on a single set of data over short intervals. In both cases, a larger fragment of a program or data can be loaded into the cache and operated on without accessing main memory each time. Main memory is designed to significantly speed up reading and writing data in blocks compared to accessing random addresses. These properties allow a code fragment to be read in its entirety from main memory into the cache and executed without |
| - | In modern processors, the cache is divided into several levels, usually three. The first-level cache (L1) is the closest to the processor, the fastest, and is usually divided into separate instruction and data caches. The second-level cache (L2) is shared, slower and usually larger than the L1 cache. The largest and the slowest is the third-level cache (L3). It is closest to the computer' | + | In modern processors, the cache is divided into several levels, usually three. The first-level cache (L1) is the closest to the processor, the fastest, and is usually divided into separate instruction and data caches. The second-level cache (L2) is shared, slower and usually larger than the L1 cache. The largest and the slowest is the third-level cache (L3). It is closest to the computer' |
| Besides size, important cache parameters are line length and associativity. | Besides size, important cache parameters are line length and associativity. | ||
| Line 27: | Line 27: | ||
| </ | </ | ||
| - | Modern processors implement longer pipelines. For example, the Pentium III used a 10-stage pipeline, the Pentium 4 a 20-stage pipeline, and the Pentium 4 Prescott even a 31-stage pipeline. Does the longer pipeline mean faster program execution? Everything has benefits and drawbacks. The undoubted benefit of a longer pipeline is that more instructions can be executed | + | Modern processors implement longer pipelines. For example, the Pentium III used a 10-stage pipeline, the Pentium 4 a 20-stage pipeline, and the Pentium 4 Prescott even a 31-stage pipeline. Does the longer pipeline mean faster program execution? Everything has benefits and drawbacks. The undoubted benefit of a longer pipeline is that more instructions can be executed |
| ===== Superscalar ===== | ===== Superscalar ===== | ||
| - | The superscalar processor increases | + | The superscalar processor increases program execution |
| <figure superscalar> | <figure superscalar> | ||
| Line 38: | Line 38: | ||
| </ | </ | ||
| <todo @ktokarz> | <todo @ktokarz> | ||
| - | In the x86 family, the first processor with two paths of execution was the Pentium, which had two execution units called U and V. Modern x64 processors like i7 implement six execution units. Not all execution units have the same functionality. For example, in the i7 processor, each execution unit has different capabilities, | + | In the x86 family, the first processor with two execution |
| <table executionunits> | <table executionunits> | ||