This is an issue I (and, no doubt many others) have encountered countless times. Many of the designs I have worked on represent a programmable system-on-chip, consisting of hardware acceleration peripherals whose care and feeding are managed by a supervisory processor which lives in a different clock domain than the data being processed (e.g. video / audio media data).
Plenty of situations arise in which the supervisor needs to trigger events within the datapath - effecting a parameter change, or requesting a sequence of actions be performed, etc. In addition, events may also propagate in the opposite direction, permitting the custom logic to interrupt the processor upon completion of the requested task.
These are just examples, but I have in general been exceptionally frustrated with FPGA toolchains' facilities for properly constraining the related paths. In the past, I have resorted to using false or multi-cycle paths, with both explainable and ludicrous side-effects. Under the "explainable" category are things such as ultra-relaxed and unbalanced paths where this is not acceptable... in crossing a set of vector values as part of a register write, I may toggle a signal in the host's clock domain, which is then double- or triple-synchronized (as described in the article) to produce a metastability-safe edge in the core clock domain. Once that edge is detected in the core logic, it can then capture the multiple-bit values from the register, confident that it has properly settled.
However, if a false path setting has allowed the router to create paths of wildly varying lengths for different bits of the vector, in conjunction with a fast core clock, data values may actually be corrupted due to some bits still failing to meet setup or hold across the asynchronous boundary.
Attempting to use multi-cycle paths has, in the case of some vendor toolchains, resulted in behavior falling under the "ludicrous" - setting a multi-cycle path across a clock domain actually caused the tools to *try* to make the paths N clocks long! This put me right back in the same situation as the false path, with my synchronized "register write detected" logic triggering faster than the bits could arrive and be guaranteed to be settled. Worse yet, the placer would often report that excessive congestion was occurring, because routing resources were being used to attempt to satisfy "unusually-long" hold constraints... rendering my chip unroutable!
Maximum delays can be workable, but they end up being very tedious, and sometimes requiring different delay values depending upon the clock rates involved, or their relative ratio.
In short, it's good to see a more expressive constraint language developing to cover this scenario. Every time I have ever encountered these problems, I always have to check my calendar to make sure of what year it is... neither I nor vendor support engineers who understood what I was talking about could ever explain how on Earth an appropriate solution for such a common, fundamental issue still didn't exist yet! :)