"Why can't my robot sing and dance at the same time?" asked Professor David May as he proposed a processor that implements a simplified ARM instruction set.
Professor May's project to develop a very simple parallel processor is
based on ARM. "We were foolish enough to try and adopt a version of the
ARM and I extracted about 30 instructions from the Thumb instruction set
and wrote a compiler for them," he said.
Professor May has
considered an implementation of this re-RISCed version of an ARM
processor that can be laid out on a bench to demonstrate the internal
workings to first-year computer science students. In email
correspondence with EE Times Professor May said he would probably
continue to use this stripped-down ARM so that students would gain
understanding into a "real machine."
However, Professor May is
also working on an even simpler processor architecture and compiler
pairing that would support concurrency, based on work that pre-dates the
transputer. This would allow students to build a multiprocessor from a
kit of parts. "I also have a hunch that this processor may be very
efficient and when I've finalized it I'm interested in getting it (or a
lot of them) built on a chip," said Professor May in the email.
his lecture Professor May also talked of extending the work into a
Raspberry Pi-style schools' project. "If I can get parallel computing
into the schools that will be a great achievement, because then we
wouldn't get all these kids thinking the world is sequential."
May then related a question asked by a colleague's 10-year-old child.
"Why can't my robot sing and dance at the same time?" was asked by
someone yet to indoctrinated into thinking that programming is
inherently sequential, Professor May observed.
The footnote to
Professor May's talk was a reminder that 2014 will be the 30th
anniversary of the launch of the transputer and its dedicated
programming language Occam.
Professor said it would be an
interesting exercise to produce a 2014 version of the transputer using
contemporary process technology. Professor May estimates that a square
centimeter of silicon today could hold up to 4,000 tranputers. "You
might decide to trade off a bit of that for memory," he conceded.
Professor May also speculated that due to the transputer's
architecture's simplicity it would be extremely fast. Meeting to discuss
such a project would at least be a good excuse for a party, he
@DutchUncle: personally, I design CPUs just for fun, but I do see a lot of new processors out there, with different ISAs and different architectures.
Let me be honest: I don't think that multi-core/SMP is the way to go. These two techniques explore paralellism, and have a lot of drawbacks (synchronization, lack of predicability).
There's another technique that does just about the same - superscalar.
The hardware always is inherently parallel - we have combinatory circuits feeding synchronous elements, all at the same time. What we fail to do so far is to figure out how to explore this massive parallel infrastructure into our benefit, without turning the General Purpose Computer in a specific one - meaning we can indeed explore this parallelism, but only for certain tasks.
Perhaps we are doing it all wrong. Perhaps we don't need "general purpose" registers. Perhaps we don't need stacks. We still use programming/CPU architecture (general purpose) as it was done in the early days (in late fifties).
Restricting the set of instructions has its benefits: it simplifies the design, reduces size and power consumption, while having an impact on the performance - this might pay off for more simple execution streams.
I do think we are specializing CPU too much: I don't think it makes much sense to have a CPU doing "complex" tasks like vector arithmetic, SIMD, and others. There might be a solution for this: not multi-core, but multiple specialized execution units, that can somewhat be independent of each other. This would require a new programming model though.
I think Professor May does want to build the processor on the lab bench at the register level.
He makes the point that no schoolchildren today have ever seen any form of mechanical calculator; neither slide-rule nor hand-cranked calculator nor even the three-position dialing machine for calculating the remainder to score in a darts game.
As a result first-year students have no idea about what sits between a high-level language and the 1s and 0s toggling on a digital chip's pins....to them it is simply "magic."
Modern processor architectures are so complex and full of exceptions and special cases that it would be highly wasteful of time to teach processor architecture in that context.
Why bother creating a new processor, unless the point is to build it out of 74xx NAND chips just to show that you can? Restricting oneself to a subset of the available instruction set, or being forbidden to use a particularly helpful built-in operation, was one of the tricks used by my professor in CS205 class back in the 1970s.
But in the interests of energy efficiency it is generally better to complete task in reasonable time and then put the system to sleep.
So not pointless with regard to energy consumption...or you could be doing other things.
If anyone wants a simple computer, think about: structured programs evaluate a relational expression(condition) and either continue with the next sequential or jump/branch to a target.
next is either an assignment or another relational expression.
The location of the expression and the first 2 operands and operator are all that are requied.
Dual port embedded RAM can deliver the two operands simultaneously while a second delivers the operator and next address.
I would not disagree with that.
But keeping multiprocessors coherent probably only works for a relatively small number of processors or applications where maintaining coherence is not overly burdensome.
A more scalable multiprocessing may require the adoption of incoherence..like the Internet.