Personal option: the issues of having cache dose not only result in storage overhead, but also introduce unpredictability e.g. in terms of performance. This is particularly true for the systems with real-time requirements. As programmers do not have to take care of data movement, maybe, another option to make progress on compiler development to manage data movement on scratchpad memory.
There are many engineering obstacles when scaling to advanced processor nodes that must be surmounted, and every one counts. This one had had many designers worrying direct-write shared memroy caches would have to be replaced with scratchpad or message passing schemes. Luckily, SRC has shed some light on this issue, hopefully keeping designers from fixing a architectureal feature that is not going to break all the way out to 512 cores per processor chip.