"Why can't my robot sing and dance at the same time?" asked Professor David May as he proposed a processor that implements a simplified ARM instruction set.
Professor May's project to develop a very simple parallel processor is
based on ARM. "We were foolish enough to try and adopt a version of the
ARM and I extracted about 30 instructions from the Thumb instruction set
and wrote a compiler for them," he said.
Professor May has
considered an implementation of this re-RISCed version of an ARM
processor that can be laid out on a bench to demonstrate the internal
workings to first-year computer science students. In email
correspondence with EE Times Professor May said he would probably
continue to use this stripped-down ARM so that students would gain
understanding into a "real machine."
However, Professor May is
also working on an even simpler processor architecture and compiler
pairing that would support concurrency, based on work that pre-dates the
transputer. This would allow students to build a multiprocessor from a
kit of parts. "I also have a hunch that this processor may be very
efficient and when I've finalized it I'm interested in getting it (or a
lot of them) built on a chip," said Professor May in the email.
his lecture Professor May also talked of extending the work into a
Raspberry Pi-style schools' project. "If I can get parallel computing
into the schools that will be a great achievement, because then we
wouldn't get all these kids thinking the world is sequential."
May then related a question asked by a colleague's 10-year-old child.
"Why can't my robot sing and dance at the same time?" was asked by
someone yet to indoctrinated into thinking that programming is
inherently sequential, Professor May observed.
The footnote to
Professor May's talk was a reminder that 2014 will be the 30th
anniversary of the launch of the transputer and its dedicated
programming language Occam.
Professor said it would be an
interesting exercise to produce a 2014 version of the transputer using
contemporary process technology. Professor May estimates that a square
centimeter of silicon today could hold up to 4,000 tranputers. "You
might decide to trade off a bit of that for memory," he conceded.
Professor May also speculated that due to the transputer's
architecture's simplicity it would be extremely fast. Meeting to discuss
such a project would at least be a good excuse for a party, he
Multi-core in smartphones is coherent memory multi-processors running general purpose operating systems. Transputer was nothing like that. Cool as Transputer was, you can't claim it as the predecessor of what is
being implemented now.
Coherent multiprocessors are very old technology. What is amazing is that you can now manufacture them cheaply enough to carry it around in your pocket to read email, browse the web, and play angry birds. It is primarily a manufacturing/economic achievement.
Actually, the real value in RISC is by breaking up the instructions into single cycle steps, you enable the CPU to make the maximum use of the data bus. You could also argue that you are eliminating the microcode engine. The cost is that you have more instructions in a program. The benefit is that you can more closely hone the performance of that program on simpler hardware.
"you won't be running Word on them any time soon"
I beg to differ; a divide-and-conquer approach with a many-core CPU could work very well indeed, by passing both parallel and "systolic" tasks:
- keyboard interpreter
- mouse controls
- buffer editor
- mass storage manager
- window formatting
- font renderer
I see no reason why this couldn't be viable.
There are uses for simple processors, or large sets of them, but they are not general computing ie you won't be running Word on them any time soon. A large number of real world applications need lots of data, and that means that making use of 1000 cores is pretty tricky if they all need to fetch data at the same time from 'random' locations.
Some applications can be rewritten to make use of cellular automata, or streaming, or systolic arrays or whatever in which the data moves through the processors in an ordered way so only the processors on the edge need to access memory and the rest just shift it through, transforming it as they go.
Since the early Inmos days these applications have become more widespread, eg video and audio compression/decompression can make use of this - but then there are specialised hardware implementations that do it better than a CPU anyway.
There are super-computing applications that might make use of lots of processors, but again they tend to need quite a lot of RAM per node (that is one reason the big machines use x86's and similar and not 1000 times as many tiny CPU's).
The problem remains not how to put lots of little brains on a chip, but how to make use of them in real applications.
(The original T400 transputer had 15 core instructions, and a mechanism to use the 16th to add extra ones which gradually happened over time. The instructions were all 1 byte long, with 4 bits for instruction and 4 bits for data, one instruction was used to extend the data into further nibbles. It was extremely RISC and had around 50k transistors I think, of which more than half were in the on chip RAM. I have the manual somewhere ...)
I am sure they will matter because instruction set complexity is related to compiler and computational efficiency, which is ultimately related to energy efficiency.
However as you indicate the challenges of multicore operation and benefits to be derived by doing it well may be greater than those from swapping out one ISA and replacing with another.
But better to do both in an optimum manner.
With little or no memory on the chip this just makes the "memory wall" more of a problem.
To execute an instruction the opcode and 2 operands are required and something has to be done with the result.
RISC trades CPU complexity for many, many more instructions.
Better to start with structured program statements and implement using memory blocks.
Horizontal microcode with local storage for variables is fast and reduces memory bus utilization.
ISAs have been massaged over and over for years and all this is taking a subset of a popular ISA.
I do not get the point.
Hi, you are right, it's an ARM3.
But the cloudx.cc project also supports AVR, MSP430 and soon a Thumb2 clone.
I'm not sure if instruction sets (or their complexity) really matter these days or in the future. There are other (multicore) challenges …