The slide states Xeon binary compatibility, but the article states Xeon Phi binary compatibility. Since AVX-512 is not compatible with LNI, I suspect the slide is correct and the quote (or the quotee) is not.-BitHead77
Sorry it took so long to get an answer to this one, but when you see how complicated it is you'll understande why.Here are all the details:
The current Intel® Xeon PhiTM coprocessor (Knights Corner) is not binary software compatible with other processors. The unique combination of new features put into Knights Corner at the time, and the software stack for Knights Corner, prevents complete compatibility between Knights Corner and other processors. However, current code developed on the current Intel Xeon Phi can generally be ported to Knights Landing and Intel Xeon processors with a recompile. For this reason, we do think that for best results, customers should get started today to modernize their code with the current Intel Xeon Phi coprocessor so as to prepare for the coming advances on Intel® Xeon® and Intel Xeon Phi (Knights Landing).
Knights Landing, however, is software binary-compatible with Intel® Xeon® Processors—specifically the Intel Haswell Instruction Set with the exception of TSX (Intel® Transactional Synchronization Extensions). The same binaries will run on both Knights Landing and Intel Xeon processors. This enables customers to readily leverage their legacy code, simplify their code base, and use the same parallel optimization techniques (cores, threads, vectors) to benefit both Intel Xeon processor and Intel Xeon Phi processors. This will deliver the most performance for the least developer investment.
This is quite true that processor manufacturer are going towards massive parallel processing, and multi core processors, but at the same time applications supporting massive parallelism should also come up, Intel's initiative like "an educational program designed to give every new programmer on the planet the opportunity to learn how to code for parallel processors", is really a good thought and step.
The slide states Xeon binary compatibility, but the article states Xeon Phi binary compatibility. Since AVX-512 is not compatible with LNI, I suspect the slide is correct and the quote (or the quotee) is not.
There are also at least two separate levels of programming that may or may not need to adapt. The operating systems control the resources provided by the hardware and parcel them out to applications. The current abstractions provided by those operating systems lean heavily on processes and threads within those processes. Applications typically use these abstractions and rely on the OS to map them into hardware appropriately. The techniques that you list, @Tony, could be very useful in terms of giving the OS more latitude in terms of these assignments, but they still need a synchronization mechanism (like the Ada rendevous concept). I sympathize with @Colin's comment that more programmer training is needed, but a good model for applications to follow would make that training more effective. The OS and tools guys need to figure that model out.
Well, parallel processing is so easy, there's so many ways to do it :)
Actually, that's the problem: from what I've seen, there is no one approach that is best suited for all problems. And, I think it's been pretty well proven that most software developers have a hard time writing bug-free and high performance code using "traditional" techniques such as threads/locking/semaphores.
So it's not surprising there's a movement torwards functional programming and "shared-nothing" programming (message passing, actors/Erlang-model, etc). However, depending on how much has to passed around, that might not be the best approach.
Was there any discussion of software support for these beasts? Operating systems have supported multiprocessors for a while now, but it doesn't seem like they have really made the best use of them. Much of their support seems to be how to throttle down to the minimum number of active cores needed for the applicaiton load. I can see this for server farms that want to provide maximum capability while minimizing power usage, but it seems like with this number of cores we may be needing some fundamental architectural changes in operating systems and / or application software. Is that true or is it just more of the same?