The current trend in processor design is to increase the number of cores as to achieve a desired performance. While having a large number of cores on a chip seems to be feasible in terms of the hardware, the development of the software that is able to exploit that parallelism is one of the biggest challenges. In this paper we propose a Data-Flow based system that can be used to exploit the parallelism in large-scale many-core processors in an effective and efficient way.
Our proposed system – TFlux SCC – is an extension of the TFlux Data-Driven Multithreading (DDM), which evolved to exploit the parallelism of the 48-core Intel Single-chip Cloud Computing (SCC) processor. With TFlux SCC we achieve scalable performance using a global address space without the need of cache-coherency support. Our scalability study shows that application’s performance can scale, with speedup results reaching up to 48x for 48 cores. The findings of this work provide insight towards what a Data-Flow implementation requires and what not from a many-core architecture in order to scale the performance.