REGISTER | LOGIN
Breaking News
News & Analysis

New CISC Architecture Takes on RISC

8/5/2015 09:20 AM EDT
16 comments
NO RATINGS
Page 1 / 2 Next >
More Related Links
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
jeffreyrdiamond
User Rank
Author
Re: Can you highlight other benefits?
jeffreyrdiamond   8/20/2015 10:47:16 AM
NO RATINGS
Thanks so much for the extra details, both about the LLVM intermediate code and the other classes of supported instructions.  Your processor really sounds interesting.  :)

 

- Jeff

 

Stefan.Blixt
User Rank
Rookie
Re: Can you highlight other benefits?
Stefan.Blixt   8/20/2015 10:34:12 AM
NO RATINGS
RISC was made for compilers, but mainly just by reducing and simplifying instruction types. Targeting the output of the intermediate representation of the LLVM compiler is very different from the original RISC approach. The RISC idea was to restrict the instructions to having the same size, similar format (opcode, register references) and similar execution sequence, simplifying pipelining. This resulted in an instruction frequency that (intermittently) could be as high as the cycle time of registers + ALU permitted, but required higher memory bandwidth. The LLVM IR output is not at all reduced or simple; instead it calls for many operations, some complex, on different sizes of data, often requiring several memory accesses and ALU cycles. Targeting that with efficiently coded instructions results in higher code density and energy efficiency.

But you are right in assuming that other instructions are also important. Those were inherited from earlier versions of the processor used in scanner and printer controllers and control panels, and they often reduce execution time and energy consumption of important functions by >90% compared to equivalent assembly code sequences. Sometimes they replace application-specific hardware rather than code, and many are optional (the control store is partly writable). There can be a selection of crypto algorithms, CRC, compress/decompress, DSP functions (FFT, FIR), and graphics primitives. There is also the Java VM bytecode instruction set, and optional autonomous microcoded processes: Ethernet MAC, IEEE1588 timestamp processing, touch screen, as well as processing and timed input/output of audio and video. The patented microcode control of memory increases performance in less expensive ways than caches and the new dual-core solution using microcode sharing also saves both silicon and energy.

jeffreyrdiamond
User Rank
Author
Can you highlight other benefits?
jeffreyrdiamond   8/18/2015 8:22:54 AM
NO RATINGS
Your article highlights the use of CISC.  IMO the "utilization" effect would be there in RISC or CISC, because the low level instructins would be there in both cases.  High level CISC instructions and variable length encoding reduces your code size and your instruction RAM energy.  You state the goal was to target the LLVM compiler output, which is identical to the original RISC approach, as you stated.  And of course, microcode gives massive ISA flexibility.

However, it seems that most of the advancements you hint at come from other sources, e.g., new low level instructions that "support the needs of modern processors", modern interconnects across the cores, etc, etc.  I would assume this is where most of the performance benefits actually derive.  Could you give more details on these aspects of your processor?

And congratulations on trying a very different approach.  That's one of the  many bonuses of the end of Moore's Law. :)

 

Stefan.Blixt
User Rank
Rookie
Re: low power with sdram?
Stefan.Blixt   8/11/2015 10:40:09 AM
NO RATINGS
No, the chip executes from internal RAM (160 KByte). It also has ROM (lower consumption), and an SDRAM interface, but those were not used for these measurements.

Regarding your doubts about GUI with TFT display: Sony UCP-8060 (google it) is a universal Java-based control panel based on an old Imsys processor in 350nm (i.e. 5 generations older CMOS technology). It refreshes the 640x480 screen (65K colors) from its main memory while also handling 100Mb/s Ethernet MAC and everything else, including touch screen and the drawing of graphics and photo images. The SDRAM interface is 16bits wide and DMA may take up to 50% of the bandwidth, i.e. 167MByte/s, which is more than sufficient for display + Ethernet. The present generation (180nm) can handle higher resolutions, and the 65nm chip even higher.

raimond
User Rank
Author
low power with sdram?
raimond   8/9/2015 11:41:13 AM
NO RATINGS
What I don't understand is how can the power consumption be lower than a MSP430 or Kinetis L micro as long as there is a sdram for program execution?

 

I also have some doubts about the claim about GUI applications with tft displays. The system uses an sdram for program AND data (including the graphic refresh buffer?) from a sdram with 8bit bus at 167MHz.

I have a cortex-m4 system at 120MHz with 32bit sdram at 60MHz which barely delivers ok performance on a 800x480 tft, 256 colors. And the program is running from flash with the micro having separate buses for the sdram/flash/sram and many other internal peripherals...

Tobias Strauch, EDAptix
User Rank
Author
Re: ISA
Tobias Strauch, EDAptix   8/7/2015 11:58:17 AM
NO RATINGS
"Utilization Wall" - interesting aspect ... can you recommend a good read about this ?


Good luck with your core ...

Stefan.Blixt
User Rank
Rookie
Re: ISA
Stefan.Blixt   8/7/2015 9:55:01 AM
NO RATINGS
The instructions are 1-10 bytes long, but those that are often used have only as many bytes as needed, often only 1 or 2. X86 instruction are up to 15 bytes long, and for those machines your comment is relevant. You are assuming our design is something like the big X86_64 used for comparing code size in the third figure. It is not.

 

Our machine is much smaller (see below), it doesn't have the kind of cache you expect, and its interface for off-chip (secondary) memory is narrow. The core is its own memory controller for its on-chip local memory – which would be main memory in an MCU or L2 cache in an MPU, and its own "disk controller" for secondary memory (flash, DRAM) or, if many-core, common "L3 cache".

 

If the longer instructions were replaced by sequences of primitive ones doing the same thing, then those sequences would be longer. They would also cause more activity, and thereby energy consumption, not only for the fetching of additional bytes but also because the hardware controlled by the microcode has more degrees of freedom in utilizing the machine cycles – it has more resources than those visible in the ISA and can therefore take shortcuts and pipeline different sequences, including memory accesses.

 

Why a small active core should be interesting: In the newest CMOS generations there is a serious limitation, modifying Moore's law:  If possible increases of density and frequency are both utilized, then the percentage of active transistors can only be a few percent, and this percentage drops by 50% per generation. This has been called the "Utilization Wall". Its consequence will be that the activity needs to jump between different logic blocks, most of which are sleeping, or that almost all the area is used for memory, or combinations of these alternatives. A many-core chip can have high throughput and energy efficiency by using the second approach, i.e. as much memory as possible. The memory each core needs for typical threads consumes on the order of 3 million transistors, and the active, maximally switching, part of the cell consisting of memory + core must be only a few percent of the transistors, e.g., on the order of 100k transistors. This is something very much smaller than a computer processor core of today. However, the frequency can be much higher, especially for a narrow datapath, which also has better utilization of silicon area.

Stefan.Blixt
User Rank
Rookie
Re: Technical Information
Stefan.Blixt   8/7/2015 8:45:39 AM
NO RATINGS
This dual-core design with new ISA is not yet avaiable as a component, but it will be used in further generations of Imsys' Velox module, which is designed for a longer life span than IC products. (The Snap Classic module has kept backward compatibility for 12 years, surviving changes of processor IC, OS, and file system.)

Stefan.Blixt
User Rank
Rookie
Re: Interesting
Stefan.Blixt   8/7/2015 8:44:45 AM
NO RATINGS
Thank you for that comment – IoT what we are most interested in at Imsys, where we are trying to optimize our firmware (including microcode) merged with hardware into a licensable IP product used not only in our own modules but also by other companies in the IoT/CPS device business.

Page 1 / 2   >   >>
Like Us on Facebook
EE Times on Twitter
EE Times Twitter Feed