I think the issue is that at 4- and 8-cores the scheduling can be almost done manually. It is a tractable problem.
As you go from multicore to many core it requires algorithmic solutions that can cope with general circumstances and can be tested and shown to cope with general circumstances, which becomes non-trivial.
I think we are at that threshold.
As I said I think the industry is in the process of moving up in abstraction. The engineering frontier will be to worry more about these scheduling algorithms. Others and far fewer will worry about processor architectures and yet others and even fewer will worry about transistor structures.
In terms of hardware design it would not be much more work.
And there are companies producing such many-cored processors such as Kalray and Adapteva and Intel (in research mode). But these tend to address particularly classes of problem for which they can be optimized.
In the more general case what would you use all the cores for, would they be homogeneous or heterogeneous, and how would you organize the memory?
There are some problems that can be easily parallelized and use all the cores efficiently. But many cannot.
A scheduling system that can remain aware of all the resources (cores) available, wake them up and retire them, know what runs best where, keep control of the memory, cope with interrupts and so on becomes more difficult as the core count increases.
The whole software industry needs to be led gently away from the uniprocessor programming model and towards something that can use the resources that hardware will be able to provide.
why companies increase the number of cores by 2x? why 8? would it be easier to skip some steps and to design 64-core or 128-core processor? from my layman's perspectve it would be not much more work to do it...comments?
I think engineers won't reinvent the wheel so the major algorithmic approaches to core wake-up. going to sleep, task migration, cache handling, etc. will be developed once (by ARM/Linaro) and used off the shelf. The companies however, may need to tune that baseline to their use of core resources....if they are not on standard configs such as 4 x 4... or have reasons to insert special-case algorithms.
They will then need to test the overall effect as this will be key to power saving and getting the best performance for key applications out of the multicore SoC. And that may then invoke tweaking of algorithms or the inclusion of hardware accelerators etc.
I think the industry just moved up in abstraction.