SANTA CLARA, Calif.--Making parallel computing easy to program for and enabling software engineers to let their imaginations run wild is AMD's new holy grail according to Joe Macri, corporate vice president and CTO of the firm's Client Division.
Speaking at DesignCon in Santa Clara Monday (Jan. 30), Macri said AMD engineers were working towards making the processing capability of the company's accelerated processing units (APUs) as accessible to programmers as the CPU is today using heterogeneous systems architecture (HSA).
HSA, said Macri, would combine scalar processing on CPU with parallel processing on the GPU, while offering high bandwidth access to memory at lower power. At the same time, Macri explained that the hardware needed to be easier to program, easier to optimize and easier to load balance--a wishlist that could prove challenging.
Though it seems like a potentially tall order, however, Macri said hardware developers did not "reinvent the wheel every time," noting that there was 40 odd years of experience to build from to create a cohesive system that works, with the ultimate goal of building an architecture that can scale up and down.
Rethinking the approach to hardware, said Macri, would allow software developers to act more freely, using the hardware as a canvas. "Software engineers are the Michelangelo’s of today," he said, adding that AMD’s goal with HSA architecture was to let the software developers focus on their vision.
"If their vision gets chipped away by the hardware, their vision goes away," he said.
Of course, the compute "vision" is always undergoing a series of changes and most recently, even the way we interact with computers is changing, with things like gesture technology coming to the fore.
"You need fixed function lower power and to immerse people in the experience," said Macri, noting that doing so would take incredible amounts of parallelism.
The APU, AMD’s fusion of the CPU and GPU on a single chip, is just the beginning, said Macri, adding that HSA was the APU’s future, and one he hoped would fast become an industry standard.
"Standards bring a whole ecosystem together, brings competitors together, allows them to compete on an even playing field," he said, explaining that AMD was really pushing for open and de-facto standards the whole industry could use.
"Open standards always win over time," Macri said, explaining that it simply makes sense because software developers want their applications to run on multiple platforms from multiple hardware vendors.
The way Macri sees the "architected era" includes full C++ and using the GPU as a co-processor. It also involves having unified coherent address space, task parallel runtimes, nested data parallel programs, user mode dispatch, pre-emption and context switching.
"Every single device we build today is constrained to a certain amount of power so dynamic power balance is crucial," he added.
Likewise, allowing the GPU to use addressable memory is important going forward, Macri said. Coherency, though it won't make anything go faster, will allow software developer to stay true to their vision, he added.
In the earlier generation computers there was a concept of bit-sliced processors and hardware time slicing. By this a single CPU computer worked like a multi core processor and the software developers could take advantage of this feature to write parallel programing applications with the required synchronization at some hardware buffers.
Looks like similar thing is appearing in a new Avatar in these latest multi-core CPUs
ARM is a very innovative company that understands the model of the next industrial business. Focusing on building the basis and depending on others to plug and play will make them remain lean with capacity to adjust to market needs.
I don't understand the new direct-hardware-access model. Seem like a shared harware resource is still going to need layering somewhere to assure ownership by one process at a time. Is this task somehow being pushed out to the hardware so that it looks transparent to the caller?
Remember that floating point math started out as a coprocessor to the x86 architecture before being integrated; in fact, I imagine that it still has an "escape sequence" in the binary to invoke the coprocessor function. If AMD is using a similar path for the future, it seems like a logical extension of an x86 feature that has been around since dinosaurs walked the earth.
How is this different from the GPGPU concept? nVidia has been at it for quite some time with CUDA and has success in a very limited set of applications - oil and gas exploration etc.
I don't see what the innovation here is.
There is some historical inertia in our whole approach to programming models.
In the 1970s memory speeds were faster than CPU clock speeds (RAM access was on the order of 100ns but CPU instructions on 1 1Mhz clock took microseconds to execute.) So programming languages took care to optimize arithmetic operations but could get away with *ignoring* memory completely...since memory accesses took place almost instantly from the processors point of view. So C does not distinguish between fast and slow memory...all pointers are equivalent. If there is a delay in accessing memory, the language makes no provision for how to reduce that latency...it does not even explicitly acknowledge that as a possibility. All programming languages today have this bias towards ignoring memory I/O, as a legacy from the popular languages of the 1970s.
Since then CPU speeds have gone up by an order of magnitude but memory speeds have only gone up slightly. And so hardware designers have used memory caches to try to manage memory invisibly to the programmer...and continue run software to run in a bubble where the conditions of the 1970s are imperfectly replicated--where memory accesses are fast and instantaneous.
Since the bottleneck in CPUs and GPUs is now memory I/O, a new type of language is needed which, at the least, allows the programmer to explicitly make a distinction between the various layers in the memory hierarchy, rather than in the kludgy way it's handled right now. Something like http://sequoia.stanford.edu/