Design Article
Lessons Learned: Developing a production-quality USB stack
Terrill Moore, CEO, MCCI Corporation
12/5/2012 2:38 PM EST
The results
To our surprise, the “bright-line” separation between Windows code and USB code substantially reduced test cost, by reducing the need for extensive re-validation. Although the overall complexity of the code was increased, the separation forced us to design the USB code as a platform for the Windows code – meaning that changes made in the USB layer and the Windows layer were effectively walled off from each other. There was little information sharing between the two portions; the Windows layer handles all of the Windows driver APIs, formats messages for the USB layer, and then decodes the results.
Because of the design rules of our portable code, we were forced to have the portable USB code execute in a separate kernel thread from the incoming Windows calls. Again, this turned out to be helpful, because it further isolated the client USB driver from the behavior of the portable code. The Windows API wrappers are only responsible for sending information back and forth - they have enough leverage to allow them to perform I/O completion handling in exactly the context expected by the client driver. This was particularly critical for supporting drivers that depended on a particular sequence of events in the Microsoft stack.
We could reconstruct those events in the Windows-specific code, and be absolutely sure that we were not disturbing the USB-related behavior of the stack.
One interesting lesson learned was in the design of the Windows APIs. Often, the design practice when implementing software modules is “be liberal in what you accept, and conservative in what you send.” However, we found that the best practice was to code API implementations conservatively, then relax restrictions as we went along. For example, we initially error-checked the size of request blocks.
Later, we discovered that some drivers don’t correctly set the size of some request blocks, and the legacy Windows stack didn’t care. At that point, we were able to remove the error check without worrying about breaking other drivers. If we had added error checks as we found they were needed, we would have had to thoroughly re-test all the devices tested to that point to make sure that our error check didn’t cause problems elsewhere. Whenever we found an area in which we were too conservative, we tested each of the target operating systems to make sure that they behaved identically in that area.
Throughout the project, MCCI used our normal design rules. With the exception of including or excluding debug code, MCCI avoids conditional compiles for configuration. Instead we use link-time or runtime configuration. This necessarily entails a little extra runtime overhead. Although the host stack needs little configuration, some things still vary from customer to customer - that information is provided in configuration tables which are consulted at run time.
It is hard to compare development costs of MCCI’s stack to the costs for other USB 3.0 stacks. Each project started with a different pre-existing base of code, and had different business goals. Our impression is that the development portion of our design approach was substantially more expensive than other approaches; the test portion was comparable.
We were somewhat nervous about the performance of this design, as measured by various industry throughput benchmarks. The argument marshaling requires pages of code. The portable USB code, because it’s portable, performs extra work, such as redundant parameter checking. Architectural differences between the portable code and the Windows stack require substantial adaptation and shimming. Despite this, the performance of our stack is the same as the other Windows USB 3.0 stacks - if anything, it’s a little faster. Not being able to investigate the source code of the other stacks, we can only speculate as to why this is. However, this gives strong evidence that portable, platform-oriented code is not inherently less efficient at runtime, even in the Windows environment.
Conclusions
We can boil our results down to a few guidelines.
Decisions about portability are always made at the beginning of a project, and are often made based on considerations of runtime efficiency or development complexity. We think our results show that, from the perspective of runtime efficiency, coding non-portably is premature optimization. Still, it can be justified in circumstances where code reuse is unlikely or less important than initial development cost and time to market.
To our surprise, the “bright-line” separation between Windows code and USB code substantially reduced test cost, by reducing the need for extensive re-validation. Although the overall complexity of the code was increased, the separation forced us to design the USB code as a platform for the Windows code – meaning that changes made in the USB layer and the Windows layer were effectively walled off from each other. There was little information sharing between the two portions; the Windows layer handles all of the Windows driver APIs, formats messages for the USB layer, and then decodes the results.
Because of the design rules of our portable code, we were forced to have the portable USB code execute in a separate kernel thread from the incoming Windows calls. Again, this turned out to be helpful, because it further isolated the client USB driver from the behavior of the portable code. The Windows API wrappers are only responsible for sending information back and forth - they have enough leverage to allow them to perform I/O completion handling in exactly the context expected by the client driver. This was particularly critical for supporting drivers that depended on a particular sequence of events in the Microsoft stack.
We could reconstruct those events in the Windows-specific code, and be absolutely sure that we were not disturbing the USB-related behavior of the stack.
One interesting lesson learned was in the design of the Windows APIs. Often, the design practice when implementing software modules is “be liberal in what you accept, and conservative in what you send.” However, we found that the best practice was to code API implementations conservatively, then relax restrictions as we went along. For example, we initially error-checked the size of request blocks.
Later, we discovered that some drivers don’t correctly set the size of some request blocks, and the legacy Windows stack didn’t care. At that point, we were able to remove the error check without worrying about breaking other drivers. If we had added error checks as we found they were needed, we would have had to thoroughly re-test all the devices tested to that point to make sure that our error check didn’t cause problems elsewhere. Whenever we found an area in which we were too conservative, we tested each of the target operating systems to make sure that they behaved identically in that area.
Throughout the project, MCCI used our normal design rules. With the exception of including or excluding debug code, MCCI avoids conditional compiles for configuration. Instead we use link-time or runtime configuration. This necessarily entails a little extra runtime overhead. Although the host stack needs little configuration, some things still vary from customer to customer - that information is provided in configuration tables which are consulted at run time.
It is hard to compare development costs of MCCI’s stack to the costs for other USB 3.0 stacks. Each project started with a different pre-existing base of code, and had different business goals. Our impression is that the development portion of our design approach was substantially more expensive than other approaches; the test portion was comparable.
We were somewhat nervous about the performance of this design, as measured by various industry throughput benchmarks. The argument marshaling requires pages of code. The portable USB code, because it’s portable, performs extra work, such as redundant parameter checking. Architectural differences between the portable code and the Windows stack require substantial adaptation and shimming. Despite this, the performance of our stack is the same as the other Windows USB 3.0 stacks - if anything, it’s a little faster. Not being able to investigate the source code of the other stacks, we can only speculate as to why this is. However, this gives strong evidence that portable, platform-oriented code is not inherently less efficient at runtime, even in the Windows environment.
Conclusions
We can boil our results down to a few guidelines.
- The primary cost of portability is at design time, not at run time.
- Separate OS bindings from abstract function.
- Mapping and abstraction don’t necessarily hurt performance, but they certainly complicate implementation.
Decisions about portability are always made at the beginning of a project, and are often made based on considerations of runtime efficiency or development complexity. We think our results show that, from the perspective of runtime efficiency, coding non-portably is premature optimization. Still, it can be justified in circumstances where code reuse is unlikely or less important than initial development cost and time to market.
Navigate to related information

