PORTLAND, Ore. — Using dual networks-on-chip (NoCs), researchers at the Massachusetts Institute of Technology are aiming for a multicore architecture that can scale to any number of cores, with cache coherency. So far, they've prototyped a 36-core version.
Their 36-core prototype uses a tiled physical layout where each tile contains a core plus a router that passes fixed-sized message packets to adjacent cores until they reach their destination cores. To maintain cache coherency -- synchronizing values stored in separate caches, the bane of multicore processors -- MIT uses what it calls Snoopy Coherent Research Processor with Interconnect Ordering. SCORPIO uses a second "shadow" mesh that snoops on the main networks to maintain a cache coherency that is claimed to be more scalable and 24 percent faster than traditional distributed-directory cache coherency and 12 percent faster than AMD's HyperTransport bus.
MIT connects 32 cores, called tiles, together with twin Internet-like TCP-IP networks to preserve cache coherency.
Li-Shiuan Peh, research professor of electrical engineering and computer science at MIT, told EE Times:
The shadow network is the bufferless, contention-free 2D mesh network that is used to maintain snoopy coherence. It ensures that all nodes know the source of requests that will be arriving on the main network. All nodes will consistently enforce an ordering, which is crucial for snoopy coherence.
Due to an L2 cache miss, a request is sent on the main network. Subsequently, a notification is sent on the shadow network declaring to all cores that a request from this source core should arrive. Since the notifications are a single bit, they can be merged with other notifications from other cores. This allows for the shadow network to be extremely fast.
Today many different interconnection topologies are used for multicore chips. For as few as eight cores direct bus connections can be made -- cores taking turns using the same bus. MIT's 36-core processors, on the other hand, are connected by an on-chip mesh network reminiscent of Intel's 2007 Teraflop Research Chip -- code-named Polaris -- where direct connections were made to adjacent cores, with data intended for remote cores passed from core-to-core until reaching its destination. For its 61-core Xeon Phi, however, Intel chose instead on using multiple high-speed rings for data, address, and acknowledgement instead of a mesh.
MIT, on the other hand is proposing that we use twin meshes -- one to send the data and another to snoop -- to make sure that the data requested by one core is the most recently available. Each core has its own cache to keep frequently used data, which it occasionally sends back to main memory to keep it up to date. The snoop protocol uses the second "shadow" mesh to make sure that requested data comes from the most recently updated source and imposes a hierarchy so that each core receives the data it needs in the correct order to execute the parallel algorithms.
After verifying that the prototype chips are functioning correctly, Peh, lead author Bhavya Daya, and several other collaborators, will test the SCORPIO protocol using a parallel version of Linux. Once verified, Peh plans to release the micro architecture of the chip written in the Verilog hardware description language.
Get all the details in Peh and Daya's paper.
— R. Colin Johnson, Advanced Technology Editor, EE Times