Editor's Note: In an earlier article I said: "If you have any interesting FPGA-based projects you've been working on I would love to hear about them." Well I recently heard back from Matthew Hagerty, who replied as follows:
I don't know if this project is terribly exciting outside of a specific group, but it is related to FPGA video. I'm reproducing the TMS9918A Video Display Processor (VDP) on an FPGA so I can get VGA output from my TI-99/4A Home Computer. I'm making a board that will be pin-compatible with the original IC so it will be a plug-n-play solution, which means any other systems that used the 9918A can also use the board (ColecoVision, MSX1, and a few others.) I started this project as a way to learn VHDL, FPGA's, video circuits, and as a stepping stone to making a 99/4A SoC. I'm currently in the "board design" phase which is turning out to be more of a task than the VHDL design was!
Of course I was tremendously interested to hear about this, and after bouncing a few emails back and forth, Mathew kindly agreed to pen the following article for our delectation and delight.
First a little history... The TMS9918A is a Video Display Processor (VDP) designed by Karl Guttag at Texas Instruments (TI) sometime between 1978 and 1981. The chip was used in the TI-99/4A Home Computer, along with several other game and home computer systems of the day, most notably the ColecoVision and MSX1 computers. The basic specifications are:
- Single chip design for interfacing an NTSC or PAL (9929A) television
- 256 x 192 pixel resolution
- 32 x 24 tiles (8 x 8 pixels per tile)
- 16 colors (including transparent)
- Access to up to 16K of video RAM
- Bidirectional 8-bit asynchronous CPU interface
- 32 hardware sprites with hardware supported collision detection
- Fully definable tile based graphics, i.e. no fixed “font ROM”
- 4 display modes: Graphics I (32 x 24 tiles), Graphics II (bitmap), Multi-Color (64 x 48 fat-pixels), and 40-column Text
The chip had a few other notable features, like being able to Genlock
to another 9918A for video overlay! However, none of the computers I am aware of used this feature. Yamaha made two later VDP chips based on the 9918A – the 9938 and 9958 – that expanded the original 9918A’s features, yet remained software compatible with the 9918A. These Yamaha chips are mostly known for their use in the MSX2 computers.
My interest in the 9918A comes from its use in the TI-99/4A Home Computer, which was my first computer, and therefore has nostalgic and sentimental value to me personally. To this day I still actively program my 99/4A computer in assembly language and participate in the small community that still rallies around the system. It is a lot of fun and computers from “yesterday” still have a lot to teach you about programming, but that is another article.
As much as I like my 99/4A, one thing that has always bothered me (and others who use the computer) is the composite video output. Using a TV as a monitor is less than desirable, and even a nice color composite monitor still suffers from color bleeding and artifacts of a composite signal. There have been many attempts by many people over the years to come up with a way to get a nice S-Video or RBG output from the 9918A. Some have tried the 9928A or 9929A version of the chip, which put out Y, R-Y, B-Y signals, while others have tried using a composite-to-RGB or S-Video circuit. As far as I know, no one has ever been successful.
When I decided to take my turn at getting RGB / VGA from the 99/4A, I wanted something quick and easy. I started looking at external conversion circuits and quickly got lost in all kinds of complex analogue amplifier and filter circuits, analogue algorithms for color space conversion, etc. It was mind-numbing. Next I went looking for a commercial unit. I though surely there is something out there that will convert a composite signal to at least S-Video. What I found was less than desirable and expensive (over $100 USD if I remember correctly.)
Next, I started to think of a hybrid digital way to convert the video by making a frame buffer. Basically a circuit that would take in the composite video, demodulate it and reconstruct the frame in a memory circuit, and then drive a VGA monitor from that. The only problem was that I was again running into a pile of analogue electronics on the front end, and no idea about how to drive the VGA on the back end.
The circuit just seemed overly complex.
I could not help but think that the screen image already existed in a perfect digital format in the video RAM, and all I needed to do was reach in there and grab it. I started thinking I could make a circuit that would watch the data going from the CPU to the 9918A and reconstruct the video buffer for VGA output. This was problematic though, since I would have to also track the sprites and other “data driven” aspects of the 9918A. Hmmm...
The idea of using an FPGA really solidified when I found the FPGA-Arcade
web site. The guy who runs this site – Mike – had reproduced entire coin-op arcade games on a single FPGA, complete with digital RGB monitor output, or a scan doubler to drive a modern VGA monitor! Also on the site were some system-on-a-chip (SoC) designs for a few classic computers and consoles, one of which was the ColecoVision (CV). If you recall from the brief history above, the CV uses the 9918A. Here was my solution! I could use an FPGA to reproduce the entire 9918A chip, and therefore directly drive a VGA monitor. The VHDL was available for download (thanks Mike), so all I had to do was pull out the 9918A core, wire up the interface, and I’d be done. Thus I bought the same Xilinx development board that Mike used, downloaded the VHDL, and started down the road of FPGA development.
The following are the first pictures I took after hooking the
99/4A motherboard to the FPGA development board.
Well, things did not quite work out as I had planned. I was impatient and just wanted the thing to work with a minimal amount of effort. I did not know any HDL or anything about FPGA development. However, the more I read, and the more I looked at the 9918A core from the CV, the more I realized I would not be able to take the “easy path” (explained below.)
Once the idea of quick-and-dirty solution was gone, I set about doing things the right way. I would learn VHDL, reproduce the entire 9918A, and add a few enhancements while I was at it. The primary problem with using the 9918A core from the CV SoC was that of timing. What I found was a 9918A core that was designed to run totally within the confines of the FPGA, with the host CPU also being on the same FPGA. This was perfect for a SoC, but not so good when trying to interface with a real-world computer.
Also, the CV 9918A core was designed to run at the original chip’s clock frequency of 10MHz and reproduce only the original functionality. While that is a good thing, I wanted to enhance my version and remove some of the limitations of the original 9918A.
The original 9918A has two primary limitations that I wanted to fix:
- Only 4 of the 32 available sprites can appear on the same horizontal line at any given time. This is due to simply not having enough time to fetch more data during a horizontal scan (the VDP is only running at 10MHz.) This causes a lot of problems in games, since the higher numbered sprites would not be visible, and that means you might not see a bullet coming at you, or a ghost in a PAC-MAN style game, etc.
- The host CPU timing. Since the 9918A operates asynchronously, and it does not have a “busy” signal to assert to the host CPU, you have to make sure you “wait” a certain amount of time between reads or writes to/from the 9918A. If you try to read or write too quickly, data will be lost.
I intended solve these problems – and a few others – by brute force of speed and density. I decided I would run the FPGA at 100MHz, which means I would have plenty of time to do all the memory access I needed for video generation, plus enough time to pull the data for all 32 sprites for every scan line. I would also have 32 sprite shift registers instead of the original 4. Last, but not least, the host CPU could never overrun the VDP up to about 25MHz or so, which is way beyond any computer that ever used the 9918A.
Timing... everything comes down to timing. I found there are things you can do at 10MHz that you simply cannot do at 100MHz. Set-up and hold times, critical path, etc. I knew nothing of that when I started, and now that I have at least a basic understanding of it, it blows me away to think about a modern CPU running at 3GHz! Needless to say, I could not use any of the VHDL from the 9918A core I got from the FPGA-Arcade site. It was designed for 10MHz, so I had to start from scratch. In hindsight this was a good thing. First, I learned more by starting from nothing. Second, the core is totally my own creation to do with as I please. Sometimes I would get stuck and go see how the other 9918A core solved a problem, and in every case I found I could not use that solution (it took too much time).
Development... I started with a VHDL example I found for driving a 640 x 480 VGA display. It was fun to realize just how easy the circuit really is! Two counters, one horizontal and one vertical, pretty much covers the guts of it. Getting a monitor synced up was pretty easy... I even added some hard coded conditions to draw a basic grid pattern... it was really cool.
That was the end of the easy stuff. Since I was developing a replacement for a real chip, to work in a real existing computer, I could not figure out how to build it up in small pieces. The VDP is configured by the host CPU, and all the data in the video RAM (VRAM) is set up by the host CPU. Thus, I needed the asynchronous CPU-to-VDP interface working so I could use the computer to test the rest of the video circuit. So, I started writing. Things I already knew how to do, or that I had some vague idea about, I would write out in VHDL. Things I didn’t know, I would write as pseudo code. Over the course of about 3 days I reworked my design until all my pseudo code was gone and the VHDL would synthesize.
At this point I had another problem; I needed to get the FPGA hooked up to the 99/4A computer. Unfortunately, the development board I have uses a 100-pin Hirose connector for the bulk of the user I/O pins. I suppose such a connector has its strong points, but being hobbyist- or prototype-friendly aren't two of them. After a quick Mouser order and some manual ribbon cable assembly, I had a 40-pin DIP to 100-pin Hirrose adapter :-) Back to the CPU I/O interface.
So, all I could really do was hook up the FPGA to the 99/4A, download my bitstream, and power up the 99/4A...
Power on... and... nothing...
Ugh! I had no idea how to troubleshoot the internals of an FPGA. Since I always try to take the easy way out, instead of learning how to set up a “test bench” (I still don’t know how), or anything else, I started using the 8 LEDs on the development board to indicate the current state of my CPU I/O FSM. After a few days of working over the VHDL, reading, changing, testing, pulling hair out, changing, repeat, I got to a point where I was getting consistently stuck in one state. Progress!
Basically, when the CPU wants to read/write data from/to the VDP, it will pull one of two VDP inputs low: CSW or CSR. The VDP performs the read or write and then waits for the CSW or CSR to go high before returning to an idle state. I was getting stuck in the state where I was waiting for the CSW or CSR to go high. But my O-scope was showing those pins were high, so how could I be stuck waiting for them to go high?
While trying to fix my CPU I/O problem, I was looking at an FSM in a book I had that was demonstrating a UART implementation. In this example there was a circuit that would sample the input for a certain number of clocks to make sure the signal was actually stable for a given period of time...
CLICK! A light bulb turned on in my head. You see what I mean about timing.
The FPGA internal logic is running at 100MHz. The host system, my 99/4A, is running at 3MHz. The FPGA clock period is about 10ns. The 99/4A’s period is about 80ns (the 9900 CPU uses a 4-phase clock), therefore the high to low transition of the 99/4A is about 80ns. At 100MHz, the FPGA could actually “see” the 99/4A’s entire transition from high to low in painful detail, and my FSM was getting stuck because it was being triggered during the sporadic and noisy transition. I added some VHDL to sample the two CSW and CSR inputs for 8 FPGA clocks (about 80ns) before the FSM would see them. I synthesized, loaded, power on, and... IT WORKED!
I was blown away. Even though I did not have video being generated yet, I knew the CPU I/O was working because in the 99/4A there is no RAM other than the 16K of VRAM (while this statement is not entirely true, it is true enough for the purposes of this discussion.) Therefore, the 99/4A makes heavy use of VRAM for things like BASIC program storage, sound data storage, etc. The computer also beeps when it powers on, and the fact that I heard beeps meant I was sitting on the master title screen and the sound data had been written to and read from VRAM! Next I plugged in a game cartridge that plays a tune when it starts up, and it played perfectly. Last, I reset into the ROM BASIC and “blindly” typed in this program:
10 FOR I=1 TO 10
20 CALL SOUND(1000, I*100, 0)
30 NEXT I
It played tones from 100Hz to 1000Hz! That means the BASIC interpreter was successfully storing my program in VRAM and then executing my code from VRAM. Mega-cool!!
Now that I had a working CPU I/O interface, I could use the 99/4A to help me test the rest of the video circuit. This was very helpful since I have game cartridges to test with, ROM BASIC, and assembly language for low level stuff.
Next I started to design the main graphics mode (mode 1), which is the 32 x 24 tile display, with each tile being an 8 x 8 pixel pattern from video RAM. This was actually pretty tricky for me since I had to sequence address generation and RAM access in the FSM. This is a rather foreign concept for a software developer. Things went pretty good though, and I got the basic display working in about a week I think.
I was still using the CRT on its side that I had set up for the PAC-MAN SoC I downloaded from the FPGA-Arcade site.
I began with raw VRAM dumps to the screen, and started to get the timing and addressing working, until finally I got the display to show something promising. A basic resemblance of the master title screen.
Then finally the actual master title screen, albeit void of color.
The following are some of the photos I took while
trying to get Graphics Mode 1 working.
Things started to move more quickly now, and I added some color. At this point I was still using the VGA connector on the development board for output, and it only has 1-bit per color, so 8 colors max for now. Another Mouser order for some precision resistors to make a 3-bit per color DAC, and a 12-pin to 100-pin Hirose adapter (I really don’t like that Hirose connector), gave me a nice looking display with 512 colors to choose from to match the original 9918A palette.
The following photos reflect my adding color,
although it's not correctly mapped at this stage.
The following pictures show my resistor DAC for 9-bit color
followed by the master title screen in full color, properly
mapped (my 9-bit color test bars are visible along the top).
During all this mess I had ordered an MSX1 computer as well (yes, eBay). It was a U.K. version and thus had the PAL 9929A in it. This was the real acid test. The 9918A and 9929A are the same other than their output, and I was making a pin-compatible replacement, so I pulled the 9929A from the socket and plugged my 40-pin cable header down into the socket. Powered on the FPGA, powered on the MSX1... IT WORKED! I was like a kid in a candy shop. The only problem was that the MSX1 defaults to the 40-column Text mode, and I had only implemented the 32 x 24 Graphics Mode 1. No problem! A few hours later and I had a partially working Text mode. It was great.
The following pictures show my development board hooked up
to my MSX1 computer followed by screenshots of the
MSX1 system running with the F18A plugged in.
The MSX1 BASIC actually has full support for all of the 9918A graphics modes and lets you easily switch between them quickly. The 99/4A on the other hand, only supports Graphics Mode 1 from BASIC (since the 99/4A does not have any other internal RAM, it has to use the VRAM for program storage, unlike the MSX.) So, I used the MSX1 to test and implement the rest of the video modes.
Let’s see, I started in April 2010, and it was now early July, so about three months of part-time hacking. The last part to implement was the sprites, and by this time I was feeling pretty confident. I tried. I tried again. I tried a third time. I just could not get a working sprite design, so I put it down and went back to Amazon to order more circuit and computer design books.
It was not until the end of October 2010 that I started messing with the circuit again to give the sprites another try. I had been reading and thinking about it for the past four months, and this time the circuit description seemed to just flow out of me. I guess my brain just needed some time to comprehend everything I had been learning. I managed to get the sprites working in about one day, with another two or three days to implement all 32 shift registers and get the timing down. I finished all the original 9918A functionality just in time to show it off at the “TI World Faire” in Chicago. This is the last remnant of die-hard TI users who still manage to get together once a year. It was fun and most everyone was excited.
The following picture shows something you will never see
on a real 9918A – 28 sprites all on the same horizontal
line [the 9918A supports 32-sprites, but TI XB
(Extended BASIC) only gives you access to 28 of them].
At this point I needed to get my design off my big development board and onto something small and dedicated. I don’t need much of a board really, just the FPGA and support (regulators, oscillator, serial flash, decoupling caps, etc.), a level shifter (the host computers are 5V TTL), the resistor DAC, a VGA header, and a way to program the serial flash. I thought for sure I would be able to buy something for $25 or $30. After all, you can get a full-blown development board with all kinds of extra SRAM, USB chips, LCD displays, etc. for $99. Well, I looked, and looked, and looked some more. Nothing. It seems that FPGA boards are like bikinis, the smaller they get, the more they cost. For a hobbyist, going from a development board prototype to a dedicated circuit board is a serious pain. I’m finding it harder to design this seemingly simple board than it was to actually develop the VHDL. Did you know that a Xilinx FPGA requires three different voltages? I didn’t! All kinds of capacitors, minimal trace lengths, etc. I was starting to glaze over and doubt that I could design a functional board.
Luckily there are some open source projects out there (dangerousprototypes.com and gadgetfactory.net) doing similar projects with FPGAs, and they are making their designs available. It gives me confidence and also a reference for a design that is working. My goal was to try and fit everything within the confines of a 40-pin DIP package, but I don’t think that is going to happen, so I’m currently trying to get a design layout that just fits in as small an area as I can manage and still stay within the confines of the prototype board houses. I also have to be able to hand solder these boards, at least the first few prototypes, after which I may look into having them manufactured as well.
Future thoughts. Initially I just want to get this project done and make it available to the community in general. There are a lot of 99/4A and ColecoVision users who are anxiously waiting for me to get these done, and I hope some MSX1 users will find them useful as well. While I was thinking about the possible user base, an idea came to mind...
The original 9918A is a general-purpose video processor. It has a simple 8-bit interface and takes the burden of video generation off of the main CPU. I was looking at some “make your own video game console” projects out there, and they are all doing video generation, and running the games, in a single microcontroller. While that is very cool, the video resolution is low (320 x 200 or so), and the processor consumption is high. So I thought maybe I would make a more generic version of the circuit for general use with any microcontroller. Heck, I could even make an Arduino “shield” version.
Most people are hooking up 2- or 4-line LCD displays because that is all that is available for under $100 or so, but how about an inexpensive VGA board? How cool would that be!? It could have the 8-bit interface as well as a serial I/O interface for those who just want to get low-speed graphics onto the screen.
Eventually I would also like to do a CPU and a 99/4A SoC. I figure I can use my F18A project as a “host” while I develop the 9900 CPU, and at that point I could just use the internal CPU as a “GPU” to do high-speed line drawing and such. The F18A could also support things like a hardware USB mouse or dual screen on a 25-year-old computer; how cool is that! As you can see there are lots of possibilities, but I have to get v1.0 out the door first...
About the author
Matthew Hagerty lives in Marshall Michigan where he works as a software engineer to pay the bills. He enjoys spending time with his family, reading, programming, tinkering with electronics, and working on a pile of self-declared projects that seems to only get bigger.
Mathew began life-long interest in programming on a TI-99/4A home computer in 1983, and has recently entered into the world of FPGA development. Visit Mathew's Journal/Blog page at http://codehackcreate.com/archives/30
for more information on this project.