News & Analysis
Comment
mateau
Hmm..Atom and AMD are x86 and as such do not compete directly with Tegra. The ...
rick.merritt
I suspect cores need to run at the same data rate to access the shared L2 ...
AMD's Jaguar packs four cores in one for mobile
Rick Merritt
8/28/2012 12:01 PM EDT
CUPERTINO, Calif. – Advanced Micro Devices will describe Jaguar, a low-power x86 core for notebooks, tablets and embedded systems at Hot Chips here. Jaguar packs four x86 cores into one unit with a large shared L2 cache to compete both with Intel’s Core and Atom chips.
In a separate keynote talk, AMD will announce a follow-on for its HyperTransport processor interconnect. Freedom Fabric aims to link thousands of cores at more than a terabit/second, likely based on technology acquired from SeaMicro.
AMD is expected to try to make Freedom Fabric an industry standard across x86, graphics and ARM cores, competing with the proprietary Quick Path Interconnect on Intel’s CPUs. Last week, the RapidIO Trade Association said it is trying to get ARM and its SoC partners to adopt its technology as a processor interconnect.
As for the Jaguar core, AMD predicts that based on simulations it will deliver more than ten percent higher frequencies and more than 15 percent more instructions per clock than Bobcat, its current low power x86 core. Jaguar will appear in 2013 in AMD’s Kabini SoC for low-power notebooks and in Temash, AMD’s first sub-5W SoC, aimed at tablets.
The chip sports a re-designed load/store unit and an expanded 128-bit floating point unit. It includes several new instructions to support AES encryption, accelerate media processing and switch big/little endian structures for embedded systems. But the most novel aspect of the new core is its use of four x86 cores in a single unit sharing one L2 cache.
“From a core perspective we will call this a four-core unit that forms the building block of an SoC design,” said Jeff Rupley, an AMD Fellow and chief architect of Jaguar. “It’s possible to fuse off some cores for lower end or lower power designs,” he said.
AMD found sharing one 1-2 Mbyte L2 cache among the cores saves silicon area over using four private caches. It also provides a performance boost when only one or two single-threaded cores are running and can then access a larger memory pool.
“Generally the larger cache outweighs the latency” of needing an L2 cache interface, Rupley said. “There could be an app where the latency increase defeats the capacity boost, but across a large swath of apps, there’s a pretty positive uplift,” he said.
One down side to the approach is that all four cores must run at the same dynamic data rate. That means the unit may burn excess power if one tasks needs a high frequency and other simultaneous jobs do not. The cores also share one bus interface to a memory controller.
On a positive note, AMD enhanced the design so that individual cores can more rapidly enter and exit deep sleep state. In addition the L2 data cache is only clocked when an outstanding transaction needs access to the data.
In a separate keynote talk, AMD will announce a follow-on for its HyperTransport processor interconnect. Freedom Fabric aims to link thousands of cores at more than a terabit/second, likely based on technology acquired from SeaMicro.
AMD is expected to try to make Freedom Fabric an industry standard across x86, graphics and ARM cores, competing with the proprietary Quick Path Interconnect on Intel’s CPUs. Last week, the RapidIO Trade Association said it is trying to get ARM and its SoC partners to adopt its technology as a processor interconnect.
As for the Jaguar core, AMD predicts that based on simulations it will deliver more than ten percent higher frequencies and more than 15 percent more instructions per clock than Bobcat, its current low power x86 core. Jaguar will appear in 2013 in AMD’s Kabini SoC for low-power notebooks and in Temash, AMD’s first sub-5W SoC, aimed at tablets.
The chip sports a re-designed load/store unit and an expanded 128-bit floating point unit. It includes several new instructions to support AES encryption, accelerate media processing and switch big/little endian structures for embedded systems. But the most novel aspect of the new core is its use of four x86 cores in a single unit sharing one L2 cache.
“From a core perspective we will call this a four-core unit that forms the building block of an SoC design,” said Jeff Rupley, an AMD Fellow and chief architect of Jaguar. “It’s possible to fuse off some cores for lower end or lower power designs,” he said.
AMD found sharing one 1-2 Mbyte L2 cache among the cores saves silicon area over using four private caches. It also provides a performance boost when only one or two single-threaded cores are running and can then access a larger memory pool.
“Generally the larger cache outweighs the latency” of needing an L2 cache interface, Rupley said. “There could be an app where the latency increase defeats the capacity boost, but across a large swath of apps, there’s a pretty positive uplift,” he said.
One down side to the approach is that all four cores must run at the same dynamic data rate. That means the unit may burn excess power if one tasks needs a high frequency and other simultaneous jobs do not. The cores also share one bus interface to a memory controller.
On a positive note, AMD enhanced the design so that individual cores can more rapidly enter and exit deep sleep state. In addition the L2 data cache is only clocked when an outstanding transaction needs access to the data.
Navigate to related information


Sanjib.Acharya
8/28/2012 1:46 PM EDT
"One down side to the approach is that all four cores must run at the same dynamic data rate."...
Does that mean the cores can not perform independent tasks? Or does that mean that all the cores run with a common clock?
How does the power consumption of AMD Jaguar compares to the competitors' product such as Intel's Atom etc.?
Sign in to Reply
rick.merritt
8/28/2012 3:03 PM EDT
The cores can run independent tasks but they must all run at the same clock rate.
Since there is no working silicon yet, AMD has no hard numbers on performance, power, etc.
Sign in to Reply
Robotics Developer
8/28/2012 5:34 PM EDT
The one thing I expected to see was the power numbers for the chip, especially as it is touted as a mobile core design. I like the idea of the shared L2 memory and wondered if the design allows for one to three cores to deep sleep if they are not needed (thus saving power)? While the cores all must run off the same clock, I wonder if they put in any clock divide logic to all for half or quarter speed operation off the same clock. Time and spec sheets will tell..
Sign in to Reply
jhf678
8/29/2012 2:19 AM EDT
With this AMD Jaguar, they sure can compete with Intel and Nvidia in the mobile market.
Sign in to Reply
rick.merritt
8/30/2012 1:45 PM EDT
I'll be very interested to see Temash, that sub 5W AMD chip with Jaguar next year, but I suspect there will be Atoms and Tegras that leapfrog it quickly
Sign in to Reply
mateau
9/17/2012 2:26 AM EDT
Hmm..Atom and AMD are x86 and as such do not compete directly with Tegra. The performance ceiling is owned by x86, however power is the issue and becoming less of one every 6 months or so. When an x86 tablet that out performs ARMH risc at similar energy footprints becomes available then that will change the market.
As Windows legacy products will become available. The world is designed and runs on x86. Apps run on ARM risc. Big difference.
I think that the importance of Jaquar though will not be in mobile devices but in the SeaMicro server products.
I find it amusingly ironic that AMD is now an oem who can profit from both ARM and Intel. SeaMicro was a hedge investment!
Sign in to Reply
hi ning jun
8/29/2012 9:56 PM EDT
"One down side to the approach is that all four cores must run at the same dynamic data rate. That means the unit may burn excess power if one tasks needs a high frequency and other simultaneous jobs do not. "
---- have no idea why the other cores need to work at the same speed? in multi-task jobs state, quad core can be in symmetry or asymmetry mode... so let us wait for the detailed spec for it.
Sign in to Reply
rick.merritt
8/30/2012 1:46 PM EDT
I suspect cores need to run at the same data rate to access the shared L2 cache.
They can go to sleep states independently, though.
Sign in to Reply
eewiz
8/29/2012 11:11 PM EDT
I dont have hope that tablet segment will be captured by x86. ARM chips are getting more and more powerful, and they have all the processing power they need.
Sign in to Reply