Editor’s Note: This article first appeared in the Summer 2011 issue of Xcell Journal (which became available just a couple of days ago as I pen these words), and is reproduced here with the kind permission of Xilinx. Actually this is a little strange, because yours truly was actually the author of this article (it’s a funny old world sometimes).
The last time I was on the receiving end of formal education was deep in the mists of time (circa the end of the 1970s). My final project for my control engineering degree was a digital controller that could display color text and “chunky graphics” on a cathode-ray tube. The entire design was implemented using cheap-and-cheerful 74-series TTL chips, each of which contained only a few simple logic gates or registers.
We didn’t have computer-aided tools like schematic-capture systems or logic simulators (the programs I wrote for my computer class were entered on a teleprinter and stored on punched cards). So my design was captured as a gate-level schematic using pencil and paper; any proof-of-concept, testing and debug took place after I’d soldered everything together.
Not surprisingly, I didn’t have the luxury of evaluating different architectural scenarios to see which would give me the best results. I just opted for an architecture I thought could “do the job” and I remember breathing a deep sigh of relief when my controller finally displayed a “Hello Max” message on the screen.
Today’s chips, by contrast, offer designers mind-boggling logic capacities and resources to solve their problems. Along with design size, however, comes complexity, which is making it harder and harder to meet cost goals and performance, power and area specifications.
Decisions made early in the design cycle have the most impact with regard to all aspects of the final chip. For example, industry analyst Gary Smith (GarySmithEDA.com) estimates that 80 percent of a product’s cost is determined during the first 20 percent of its development cycle. This means that it is absolutely imperative to select the optimum hardware architecture as early as possible in the development process.
But how can you teach this sort of thing to engineering students? With so much groundwork to be laid in the foundations of electrical engineering, and with limited time, universities historically haven’t been able to focus on teaching architecture to the depth that is now required. A master’s-level complex digital design course at MIT is trying to change all that. By leveraging the combination of FPGAs (through the Xilinx University Program) and real-world electronic system-level (ESL) design, which supports architectural exploration at higher levels of hardware abstraction, students are accomplishing in weeks what would have required an entire school year, or more, of study in the past.
Welcome to 6.375
I recently heard about a course called 6.375 at the Massachusetts Institute of Technology (MIT). It seems this course is changing the playing field when it comes to teaching digital design. In particular, a key focus of 6.375 is the use of architectural exploration to home in on optimal designs. The thing that really intrigued me is that the course is a mere 13 weeks long, of which the students have only six weeks to design, implement and verify their final projects. But these projects are of a complexity that would bring grizzled, practicing engineers to their knees, so how can this be possible?
First I bounced over to the MIT website, where I read: “6.375 is a project-oriented subject teaching a new method for designing multimillion-gate hardware designs using high-level synthesis tools in conjunction with standard commercial EDA tools. The emphasis is on modular and robust designs; reusable modules; correctness by construction; architectural exploration; meeting area and timing constraints; and developing functional FPGA prototypes. This subject relies on high-level architectural knowledge and programming expertise rather than knowledge of low-level circuit design.”
Well, this certainly sounds jolly interesting, but what does it mean in the real world? In order to learn more, I called Professor Arvind, the Johnson Professor of Computer Science and Engineering at MIT and a member of the Computer Science and Artificial Intelligence Laboratory. Arvind inaugurated 6.375 around seven years ago and has been evolving the course ever since. From what I hear, this has been quite an adventure.
Professor Arvind, the muscle behind 6.375
When 6.375 started, its focus was ASIC design. There were several problems with this, not the least that ASICs are so complex and there were too many tools involved in order to achieve anything realistic. Also, since the department didn’t have the ability to fabricate the chips, everything was evaluated using software simulation, whose relatively slow speed limited the amount of testing that could be performed. And perhaps the most important thing was that the lack of physical chips to play with meant that the class was not as stimulating for the students as Arvind had wished.
A few years into the course it was decided to switch to FPGAs (the curriculum largely ignores the special properties of FPGAs and concentrates on straightforward RTL design), in the belief that having physical realizations of their designs would be significantly more exciting for the students. Another big consideration was that software simulation takes so long and runs out of steam when it comes to the tremendous amount of vectors required to fully test today’s complex projects. Many designs don’t even start to exhibit interesting or corner-case behavior until a long sequence (perhaps tens or hundreds of millions) of test vectors has been processed.
Today, the course – which draws a mix of computer science and electrical engineering majors – features the combination of an ESL design and verification environment coupled with a state-of-the-art FPGA development system from Xilinx that was designed with universities in mind.
The first six weeks
The first half of the course uses examples of increasing complexity to introduce the Bluespec
hardware description language (HDL) and associated design and verification environment. The second half is devoted to the students’ projects. The students know hardly anything about Bluespec or hardware design before they start (this year, none of the students knew Bluespec and only three had rudimentary hardware design experience). Nevertheless, they tackle complex projects.
The first three weeks are devoted to teaching Bluespec SystemVerilog (BSV), an ESL hardware description language that is based on the concept of guarded atomic actions (a high-level abstraction for describing complex concurrent systems). BSV allows students to quickly and concisely capture their designs at a high level of abstraction (see Figure 1). These high-level, cycle-accurate representations can be simulated an order of magnitude faster than their standard Verilog equivalents. Of particular interest is the fact that the designs can be highly parameterized so as to facilitate architectural exploration.
Figure 1. The Bluespec flow is essentially pushbutton;
all the mundane tasks are performed automatically.
In the fourth week the students learn how to use the FPGA development boards. These are Xilinx
XUPV5-LX110T development systems (Figure 2), a powerful and versatile platform packaged and priced for academia. The XUPV505-LX110T is a feature-rich, general-purpose evaluation and development platform with onboard memory and industry-standard connectivity interfaces. This provides a unified platform for teaching and research in disciplines such as digital design, embedded systems, digital signal processing, operating systems, networking, and video and image processing.
Figure 2. The powerful and versatile Xilinx XUPV5-LX110T
development system is priced for academia.
The students take the BSV test lab representations that they created in the first three weeks and synthesize them into corresponding Verilog RTL representations. (Arvind tells me that the students regard BSV vs. standard Verilog in the same way software developers regard C/C++ vs. assembly language.) The Verilog is then synthesized into an equivalent gate-level representation that is loaded into the FPGA development board.
When FPGAs were first introduced to the course, things weren’t quite as easy as they are now, and the students tended to spend too much time bringing up the FPGA infrastructure instead of working on their designs. Today, after much work by Bluespec, Xilinx and MIT students, the entire flow is essentially pushbutton, with all of the mundane tasks performed automatically behind the scenes. Now, the students no longer have to spend time worrying about getting the FPGAs to work – their focus is all about the architecture of their designs.
One of the things that helps keep things simple is that after the design has been synthesized and loaded into the FPGA development board, students continue to employ the original testbench they used to verify the high-level BSV representation by means of software simulation. They may create the testbench itself in BSV or C/C++. The interfacing between the testbench running on a PC and the FPGA development platform is achieved using the Standard Co-Emulation Modeling Interface (SCE-MI). Once again, all of this is largely transparent to the students.
The fifth and sixth weeks are devoted to labs on processor design – specifically, working with a pipelined processor core, bringing this core up on the FPGA development board and then writing C/C++ programs and executing them on the core running in the FPGA.
Working in teams of two or three, the students spend the sixth week (and spring break!) deciding on their projects, presenting these projects to the rest of the class in the seventh week and receiving a final approval from Arvind. They devote the next six weeks to designing, capturing, testing, debugging and verifying these projects. This is where the fun really starts.
Only six weeks to do what?
Arvind’s goal has always been for the students to work on sophisticated designs, but even he is surprised at the level of complexity that he’s seeing. He notes that even for practicing engineers these are nontrivial projects to realize in only six weeks. He also says that when he describes the things his students are doing to people in the industry, their reaction is often “What? You must be joking!”
The following briefly summarizes this year’s projects.
- Project 1: Optical Flow Algorithm; Adam Wahab, Jud Porter and Mike Thomson, mentored by Abhinav Agarwal. Optical flow algorithms are used to detect the relative direction and magnitude of environmental motion observed in reference to an “observer.” Optical flow has a wide range of applications, especially in robotics. The goal of this project was to develop an implementation of the Lucas-Kanade algorithm that could be incorporated into the Harvard RoboBee project, which aims to build micromechanical, autonomous, biologically inspired robots able to flap their wings (http://robobees.seas.harvard.edu). “It was amazing to me that these guys managed to create an architecture that could sustain 205 frames per second for 64 x 64 frames,” Professor Arvind said. Initial ASIC synthesis in 130-nanometer process technology shows that this design would consume 42 microjoules/frame, compared with 1,960 µJ/frame running in a software version on an embedded PC.
- Project 2: Rateless Wireless Networking with Spinal Codes; Edison Achelengwa, Minjie Chen and Mikhail Volkov, mentored by Kermin Elliott Fleming and Alfred Man Cheuk Ng. The aim was to provide an implementation for a novel rateless wireless networking scheme called Cortex. Arvind notes that this protocol was developed quite recently at MIT CSAIL by Professor Hari Balakrishnan and this is its first implementation in hardware. The paper provides analysis to show that implementing this design as an ASIC should achieve the desired data rates.
- Project 3: Data Movement Control for the PowerPC® Architecture; Silas Boyd-Wickizer, mentored by Asif Khan. The goal was to explore whether extending an ISA with three instructions to move data between caches could help software make better use of distributed caches on multicore processors. The student modified an existing FPGA implementation of a multicore PowerPC done in BSV. This entailed many changes including in the cache-coherence protocols, and Boyd-Wickizer was able to run several benchmarks to show the advantage of his scheme.
- Project 4: Viterbi Decoder; Omid Salehi-Abari, Arthur Chang and Sung Sik Woo, mentored by Myron King. Using a convolutional encoder at the transmitter associated with the Viterbi decoder at the receiver has been a predominant forward-error-correction (FEC) technique to increase the reliability of digital communication. However, a Viterbi decoder consumes large resources due to its complexity and ever-increasing data rates. The goal of this project was to boost the throughput of the Viterbi decoder by means of a novel parallel and pipelined architecture. The group has produced a Viterbi module that can be used by others and sustains 150 Mbits/second at 150 MHz on an FPGA. That’s 400x faster than a MATLAB implementation on a PC.
- Project 5: H.265 Motion Estimation; Mehul Tikekar and Mahmut E. Sinangil, mentored by Alfred Man Cheuk Ng. Motion estimation is an essential component of any digital video encoding scheme. H.265, the next-generation standard in development to follow H.264, allows variable-size coding units to increase coding efficiency. The project goal was to implement a scheme that can sustain at least 30 frames per second (fps) for 1,280 x 720-frame resolution. The project produced a design that sustains 10 fps at 50 MHz on FPGA and 40 fps at 200 MHz when synthesized with a 65-nm cell library. The design is going to be submitted for fabrication in the next few months.
One of the things Arvind is particularly interested in is the creation and use of intellectual property (IP). He encourages the students to use as much IP in their designs as they can find – both from previous years’ projects and from the Internet. He also exhorts the students to produce their own IP blocks in a form that that will be of use to future classes. “The students very quickly learn that IP is not so easy to use unless it’s been created and documented in an appropriate manner,” he said. “This includes being designed in a highly parameterized way.”
The ultimate goal of the class is not simply the creation of very complex designs in a very short period of time, but to also to evaluate different architectural scenarios so as to understand the effects alternative architectures have in terms of the area (resources), power consumption and performance/throughput of their corresponding implementations.
“I believe that the students’ ability to perform architectural exploration is absolutely essential,”
said Arvind. “The combination of BSV at the front end with the ability to run millions of vectors on the FPGA boards for power/performance profiling at the back end allows the students to evaluate the effects of different architectures in a way that simply wouldn’t have been possible just a few years ago. Today’s ultramodern tools and techniques offer fantastic educational possibilities – it’s incredible what clever people can do given the right tools.”
Who ‘nose’ what the future holds?
I heard an interesting factoid the other day that struck me as being strangely pertinent to these discussions. El Capitan is a 3,000-foot vertical rock formation in Yosemite National Park. This granite monolith is one of the world’s favorite challenges for rock climbers.
Once considered impossible to climb, El Capitan is now the standard for big-wall climbing. Today there are numerous established routes on both faces, the Southwest and the Southeast, but the most popular and historically famous route is the Nose, which follows the prow between the two faces.
El Capitan in Yosemite National Park
Believe it or not, the first ascent of the Nose, which occurred in 1958 by a team led by Warren Harding, took 45 days using fixed ropes. Seventeen years later, in 1975, Jim Bridwell, John Long and Billy Westbay made the first one-day ascent. In November 2010, Dean Potter and Sean Leary set a new speed record for the Nose, climbing the entire route in just two hours, 36 minutes and 45 seconds.
How is it possible to go from 45 days to only a couple of hours? Well, today’s climbers operate under completely different assumptions to the early climbers and use a completely different approach. They carry no packs or shirts or food or water. All they take between them – in addition to minimalistic homemade climbing harnesses – is a single 200-foot length of 9mm rope, a few carabiners and a handful of spring-loaded camming devices.
If you start with the idea that you’re going to have to camp out to climb the mountain, you are going to have to carry a lot more gear, which will slow you down and take longer. But what happens if you change your initial assumptions? If you plan to climb the mountain in less than a day you can cut down on the amount of gear you have to carry. If you plan on climbing it in a couple of hours you can also dispense with food and water.
In much the same way, chip design teams typically start with their own set of assumptions. They assume that learning a new approach comes at a cost. They assume that incremental change is all that’s possible. They assume they have to painstakingly plan out the microarchitecture with (overly) detailed specifications. And they assume that they have only one shot at the architecture.
The experience of MIT’s 6.375 Digital Design course is turning these assumptions on their head. With the right approach – using modern design tools and development platforms – it is possible for the students (and real-world designers) to quickly express and evaluate alternative architectures so as to come up with optimal implementations.
About the Author
Clive “Max” Maxfield is president of Maxfield High-Tech Consulting
and editor of the EE Times Programmable Logic DesignLine
After receiving his BSc in control engineering in 1980 from Sheffield Hallam University, Sheffield, England, Max began his career as a designer of central processing units for mainframe computers. Over the years, he has designed and built all sorts of interesting “stuff,” from silicon chips to circuit boards and brainwave amplifiers to Steampunk “Display-O-Meters.” Max has also been at the forefront of electronic design automation (EDA) for more than 20 years.
Max is the author and co-author of a number of books, including Bebop to the Boolean Boogie (An Unconventional Guide to Electronics)
, FPGAs: Instant Access
, and How Computers Do Math
If you found this article to be of interest, visit Programmable Logic Designline
where you will find the latest and greatest design, technology, product, and news articles with regard to programmable logic devices of every flavor and size (FPGAs, CPLDs, CSSPs, PSoCs...).
Also, you can obtain a highlights update delivered directly to your inbox by signing up for my weekly newsletter – just Click Here
to request this newsletter using the Manage Newsletters tab (if you aren't already a member you'll be asked to register, but it's free and painless so don't let that stop you [grin]).