Breaking News
Blog

Soft Machines: Promising, Not Proven

Latest simulations look impressive
NO RATINGS
Page 1 / 2 Next >
View Comments: Newest First | Oldest First | Threaded View
Page 1 / 2   >   >>
KarlS01
User Rank
Author
Re: Architecture advantage
KarlS01   2/16/2016 12:04:56 PM
NO RATINGS
High performance is only important when you need it.  Even emergency vehicles only need to go fast in case of emergency.  Just like optimizing only the critical paths increases performance.  Yes the critical path should be identified early on.

The ALU and condition code register were the CPU culprits most times.

If we could only smooth out the peaks.

betajet
User Rank
Author
Re: Architecture advantage
betajet   2/16/2016 11:35:49 AM
NO RATINGS
Many DSP applications are embarassingly parallel and thus well-suited to DSP chips and FPGAs.  IMO the key feature of a DSP chip is that it has lots of local memory bandwidth to multi-port memories.  It's very easy to make a fast ALU.  The challenge is keeping it busy so that you can get sustained performance. [*]  If you can fit a segment of your calculations into high-bandwidth local memory and keep those ALUs busy you get terrific performance.

FPGAs take it one step further by letting you create many ALUs that match the structure of the problem so that you can immediately pass the result of one calculation on to the next ALU without having to go through the memory bottleneck.  Such structures are reasonably easy to synthesize automatically, which is why DSP has been a successful target for high-level synthesis.

[*] Parallel processor makers like to talk about "peak performance", which a wag once described as "a guarantee from the manufacturer that you can't go faster than this."

traneus
User Rank
Author
3x3 matrix inversion
traneus   2/16/2016 11:31:05 AM
Each element of the inverse of a matrix depends upon all the elements of the original matrix, but does not directly depend upon the other elements of the inverse. For 2x2 and 3x3 and 4x4 matrices, sums-of-products (inner products) algorithms are known (see en.wikipedia.org/wiki/Invertible_matrix). These algorithms are parallelizable as betajet points out. I have not seen these algorithms extended to larger matrices.

KarlS01
User Rank
Author
Re: Architecture advantage
KarlS01   2/16/2016 10:59:24 AM
NO RATINGS
@betajet:DSP and matrix inversion are not in my skillset.  Altera has DSP builder for FPGAs and  DSP designers do not seem to be complaining(do they use DSP builder?).  Yhe general problem comes from those who try top do it with a CPU.  And you do not seem to have a problem with DSP, either.

Matrix inversion got a lot of attention in IBM Poughkeepsie and led to such things as the carry save adder.

I did come across the code for the Doolittle decomposition.  It is nested for loops that uses pointers to access some memry addresses over and over(depending on which matrix cell is being calculated.  Pointers are used to address memory of course.


My thought is that the matrix itself is not too big to stream to an FPGA embedded RAM where the access time is orders of magnitude faster, do the inversion/decomposition and stream the result back.

Am I missing something?  FPGA accelerators outperform CPUs with much faster clock and I think having the data in local memory is the reason.

But the RISC Super Scalar crowd will claim that the register files will eliminate all but a few of those memory accesses.

Technocrazy
User Rank
Author
Re: Architecture advantage
Technocrazy   2/16/2016 8:43:07 AM
NO RATINGS
They are targetting general purpose/traditional server markets. The market itself will be in a decline in course of time. Yaaawwwnn

Technocrazy
User Rank
Author
Not exciting
Technocrazy   2/16/2016 8:41:25 AM
NO RATINGS
Not an exciting tech to be in. One can always write code to make things faster.. fooling the investors is easy!!

betajet
User Rank
Author
Re: Architecture advantage
betajet   2/16/2016 12:30:19 AM
NO RATINGS
Some applications are inherently parallel, others aren't.  A matrix multiply is a bunch of inner products, each of which can be calculated independently.  An FFT has lots of parallelism.  A long inner product has a lot of parallelism, since it can be broken into multiple segments.  The segments can be calculated in parallel and the results combined.

On the other hand, I don't know of a parallel algorithm for matrix inversion.  Someone here will probably suggest one if it exists.

Except for "embarassingly parallel" applications, concurrent programming is difficult and there are lots of hazards that don't exist in sequential programming.  Writing correct parallel code is hard, and debugging can be difficult.  Even the two-way parallelism of a main thread plus interrupt service routines is tricky.  With more general parallelism it gets really nasty.  A friend of mine likes to say: "Here be dragons".

traneus
User Rank
Author
Re: Architecture advantage
traneus   2/15/2016 7:54:05 PM
NO RATINGS
KarlS01 wrote: "Generally algorithms are summations such that a value must be computed BEFORE it can be used in the next step."

Summations are usually expressed in sequential form, but that does not mean that the summations must be performed sequentially. See VHDL, Verilog, APL, and other dataflow languages. The terms of the summation can be all calculated in parallel (since the terms do not depend on each other), and most of the additions can also be done by adding pairs of terms in parallel and then adding pairs of the pairs, et cetera.

Has anyone here done matrix operations in parallel VHDL?

KarlS01
User Rank
Author
Re: Architecture advantage
KarlS01   2/15/2016 9:40:03 AM
NO RATINGS
GSMD wrote"The only way out to increase performance is to teach programmers to write parallel code. We keep trying to defer this."

There are no teachers qualified because problem solving and even computation is a sequential process  Generally algorithms are summations such thjat a value must be computed BEFORE it can be used in the next step.

Furthermore there is the von Neumann bottleneck that leads to the memory wall.  A CPU cannot process data it does not have. 

GPUs and accelerators stream data as opposed to load/store one word at a time and in the case of accelerators, there are no instruction fetches,  There is a message here:

1) Stream the data to and from memory.

2) Use local storage when processing to reduce accesses and improve access time to data.

The Cache concept evolved from data buffering for a single user so that cache lines served as buffers for multiple users based on the probability that once an area was accessed in memory that nearby addresses would also be accessed.(not guaranteed)

Today, much processing is done on blocks of data and that is the reason for streaming and defining blocks of data to be processed.

Allocate a block of data and put its starting address in a pointer.  Then manipulate the pointer to the next address and access memory one word at a time.

 

sw guy
User Rank
Author
Re: Architecture advantage
sw guy   2/15/2016 8:33:01 AM
NO RATINGS
BTW, VLIW was successfull in a specific area: for some DSP
(were there is a fit with predictable scheduling inside a core loop).

Not really general purpose, I agree.

Page 1 / 2   >   >>

Datasheets.com Parts Search

185 million searchable parts
(please enter a part number or hit search to begin)
Like Us on Facebook
EE Times on Twitter
EE Times Twitter Feed