Main memory is a major performance bottleneck in current chip multiprocessors. Current DRAM banks latch the last accessed row in an internal buffer, namely row buffer (RB), which allows fast subsequent accesses to that row. This throughput-oriented approach was originally designed for single-thread processors and pursues to take advantage of the spatial locality that individual applications exhibit. This paper proposes row tables, a pool of row buffers shared among threads. Depending on the needs of each thread, row buffers are dynamically allocated to threads.
Two design approaches are devised differing on the table location, and referred to as BRT (Bank Row Table) and CRT (Controller Row Table), which place the table at the bank, as traditionally done in existing modules, and at the memory controller side, respectively. CRT performs better than BRT in high RB locality applications (or mixes) but performs worse in poor RB locality applications since the increase in transfer times is not later amortized. A variant of CRT referred to as CRT 1/x has been devised to reduce this performance penalty. Results for a 4-core system show that, on average, BRT and CRT 1/x mechanisms save energy by 23% and 7%-16% (depending on the X value) and improve IPC by 10% and 9%-14%, respectively.