To our surprise, the “bright-line”
separation between Windows code and USB code substantially reduced test
cost, by reducing the need for extensive re-validation. Although the
overall complexity of the code was increased, the separation forced us
to design the USB code as a platform for the Windows code – meaning that
changes made in the USB layer and the Windows layer were effectively
walled off from each other. There was little information sharing between
the two portions; the Windows layer handles all of the Windows driver
APIs, formats messages for the USB layer, and then decodes the results.
of the design rules of our portable code, we were forced to have the
portable USB code execute in a separate kernel thread from the incoming
Windows calls. Again, this turned out to be helpful, because it further
isolated the client USB driver from the behavior of the portable code.
The Windows API wrappers are only responsible for sending information
back and forth - they have enough leverage to allow them to perform I/O
completion handling in exactly the context expected by the client
driver. This was particularly critical for supporting drivers that
depended on a particular sequence of events in the Microsoft stack.
could reconstruct those events in the Windows-specific code, and be
absolutely sure that we were not disturbing the USB-related behavior of
One interesting lesson learned was in the design of
the Windows APIs. Often, the design practice when implementing software
modules is “be liberal in what you accept, and conservative in what you
send.” However, we found that the best practice was to code API
implementations conservatively, then relax restrictions as we went
along. For example, we initially error-checked the size of request
Later, we discovered that some drivers don’t correctly
set the size of some request blocks, and the legacy Windows stack didn’t
care. At that point, we were able to remove the error check without
worrying about breaking other drivers. If we had added error checks as
we found they were needed, we would have had to thoroughly re-test all
the devices tested to that point to make sure that our error check
didn’t cause problems elsewhere. Whenever we found an area in which we
were too conservative, we tested each of the target operating systems to
make sure that they behaved identically in that area.
the project, MCCI used our normal design rules. With the exception of
including or excluding debug code, MCCI avoids conditional compiles for
configuration. Instead we use link-time or runtime configuration. This
necessarily entails a little extra runtime overhead. Although the host
stack needs little configuration, some things still vary from customer
to customer - that information is provided in configuration tables which
are consulted at run time.
It is hard to compare development
costs of MCCI’s stack to the costs for other USB 3.0 stacks. Each
project started with a different pre-existing base of code, and had
different business goals. Our impression is that the development portion
of our design approach was substantially more expensive than other
approaches; the test portion was comparable.
We were somewhat
nervous about the performance of this design, as measured by various
industry throughput benchmarks. The argument marshaling requires pages
of code. The portable USB code, because it’s portable, performs extra
work, such as redundant parameter checking. Architectural differences
between the portable code and the Windows stack require substantial
adaptation and shimming. Despite this, the performance of our stack is
the same as the other Windows USB 3.0 stacks - if anything, it’s a
little faster. Not being able to investigate the source code of the
other stacks, we can only speculate as to why this is. However, this
gives strong evidence that portable, platform-oriented code is not
inherently less efficient at runtime, even in the Windows environment.
We can boil our results down to a few guidelines.
- The primary cost of portability is at design time, not at run time.
- Separate OS bindings from abstract function.
- Mapping and abstraction don’t necessarily hurt performance, but they certainly complicate implementation.
Knuth, author of the influential “The Art of Computer Programming,”
said it best when he wrote, “We should forget about small efficiencies,
say about 97% of the time: premature optimization is the root of all
Decisions about portability are always made at the
beginning of a project, and are often made based on considerations of
runtime efficiency or development complexity. We think our results show
that, from the perspective of runtime efficiency, coding non-portably is
premature optimization. Still, it can be justified in circumstances
where code reuse is unlikely or less important than initial development
cost and time to market.