Ultimately surely the general purpose model has to be similar to that used by multiple computing nodes hanging off the Internet, but writ small on a single die.
For now the likes of Intel and ARM want to keep everything coherent and synchronized but my instinct tells me that cant work as you go to scores of core except in certain very tightly controlled applications.
Plurality's HyperCore developed a many-core design not only for wireless inftastructure.
Although not successful yet in producing a maketable product, it proposed a holistic solution for programming model and hardware that programmers liked and silicon could handle.
I believe message passing for massively parallel machines is not usefull. Here is another one who tries.
Don't forget Martin Marietta with their Geometric Arithmetic Parallel Processor (GAPP) in 1988. A large circuit board held 32,000 processors as I recall. It supported a Single Instruction Multiple Processor (SIMP) approach. Great for working on images after the initial overhead of clocking in the data from an edge.