Spec for an XC2S600E based ATX FPGA-PC Board

Home | Projects | FPGA-PC | Board Spec XC2S600E

Spec for an XC2S600E based ATX size FPGA-PC Board

This is an preliminary spec, only for discussing of the feature set

At present no detailed pinouts or pin counts have been investigated. Dito no checks if all the desired functions will fit the available pins. Or any decisions about some functions sharing pins.

Situation
FPGA Section
Memory
Form Factor, Case and Power
Expansion and Bus
User IO Devices
Storage Devices
Configuring the FPGA

Situation

Prehistory

For my PDP-10 clone I was initially going to use an prototype board with self made memory and IO connector modules. This though has the problem of constructing and manufacturing these modules (time mainly), and people copying my design having to repeat all the manufacturing work.

Despite this I was not decided on making an custom hardware board, because this would be even more work/time, in designing, manufacturing and processing orders for the board. I was hoping to find someone who would come up and do this for me and the other users.

But in case I did decide to make an board, I had collected various ideas for this in the Hardware file. It turned out that there was actually not much PDP-10 clone specific stuff in that list, so I had alredy decided I wanted an general-purpose FPGA-PC board (named so after FPGA-CPUs), usable by other cloners, to save them wasting time repeating nearly identical design work.

Newer Development

Now I have recieved an offer from Andrew Grillet, who makes the Quickstart QST0201 board, to take part in co-speccing an XC2S600E based general-purpose FPGA-PC board that he is intending to design and sell.

It must definitely run ESAs Leon Sparc clone. This is Andrews reason for designing this board, and so as his part of the spec. But it can also run anything else which has no feature set conflicts with this. I see no collision for my PDP-10 or any other cloner.

The aim of this board is to be an "PC without an PC". That is for it to use, apart from the special board itself, only readily available and non/low-troublesome PC components, such as cases and power supplies, memory modules, disk drives, interfaces and peripherals (keyboard, mouse, monitor, etc), but to leave out the troublesome or disliked PC stuff, such as the x86 CPU and PC architecture chips (north-/southbridge), replacing these with the FPGA, which is used to implement the users desired CPU and system architecture. This makes it easy and cheap for users to put together an FPGA-PC system without soldering, possibly reusing existing components.

This shall be an general-purpose large systems (= 32/36/64bit, as opposed to small systems (= 8/12/16/18/20/24bit) computer cloning board. Possible uses for this board are:

Large modern CPUs such as the Leon Sparc, or future i386/PPC/ARM/MIPS, running systems such as Solaris or BSD or Linux
Large historic CPUs such as PDP-10, VAX, IBM 360/370/390/zSeries, GE635/645, but also Amiga, Atari ST, running their native systems for historical interests, or ports of BSD or Linux with an decent processor beneath
Large experimental CPUs, aiming for full processor feature and full speed, running custom BSD or Linux ports

This board is assumed to be usable for daily PC style work, with an full OS installed, modern memory size and speed, up to server disk space/speed, up to workstation level resolution/depth graphic output, and Ethernet access.

Small Board

While speccing this board I also had a few ideas for an smaller XC2S150 based FPGA-PC board, with smaller FPGA and small SRAM memory, for small (= 8/12/16/18/20/24bit) systems (or even simple low memory 32/36 stuff). [Andrew: This may be interesting for your PDQ8, instead of making an PDQ8 specific extension board for QST0201. Combine the work time for dual usable result. OTOH perhaps run PDQ8 on the large board, if that is not massive overkill. I also regard this complete FPGA-PC board as easier sellable than the QST0201 experimenting board, which has established competition]

This is NOT an project by Andrew Grillet, unless he adopts it when I tell him about it. In the mean time I have asked him and it seems unlikely (not enough sales volume, better concentrate on one board).

FPGA Section

Chip Selection

The plan ist to use an Xilinx Spartan-IIE chip, as it has lots of space at low cost. The board will use the top of range part, as the chip cost is small relative to the rest of the design (all other components, board making, assembly, handling). This choice of chip is also given by Andrew. It fits my plans, apart from needing to replace my non "E" capable software tools (something I had planned to replace for other reasons anyway, but do later on in the project).

Top of range, according to the Xilinx data sheet, is the XC2S600E part (with 2*48x2*72 LUTs and 6x12 BRAMs) largest in an FG676 case (gives 514 user IOs) using 3'961'632 (=4M PROM) config bits. This is in an (fine) ball grid case, but Andrews board manufacturer can process BGAs, which is better than anything I could design, layout or get built.

Problem with this chip is that the free Webpack software does not support this size. So it would require the ISE software, making using the board very expensive, too much for many users. So possibly fall back to the XC2S300E part (with 2*32x2*48 LUTs (less than 1/2) and 2x8 BRAMs (less than 1/4)) in an FG456 case (gives 329 user IOs (is 2/3)) using 1'875'648 (=2M PROM) config bits. XCV300E would be supported and doubles BRAMs, but only even 312 user IOs and far more expensive.

Using such an high pinout device suggests the requirement of an many-layer board, so other high pinout parts can also be designed in, but should be restricted to where neccessary (because of layout time, and possible higher board cost).

Clocking Circuits

The Spartan-IIE can take up to 4 global clocks. The board should offer the use of all 4 of these, for maximal flexibility. This costs only 4 user IOs, leaving 510 (or 325 for XC2S300E). That is less than 1% of 514, and provides for a lot of functionality. Drive them by:

Clock 0: Programmable oscillator. This is expensive, but most likely will be cheaper than multiple fixed freqency oscillators, when users clock-optimize their designs. Frequency range should cover 30MHz (any design should work here) to 200MHz (fast designs can reach this). Nice would be 20-300MHz for more margin [Andrew: programmable clock settable by switches or by JTAG?]
Clock 1+2: DIP sockets for fixed freqency oscillators. Sockets are cheap if not used, and fixed freqency is good enough for 2nd and 3rd clock sources
Clock 3: Debug clock comming from an development system, if there is an pin left on the LPT configuration section to get this from there. Allows development software driven clocking of (part of) the FPGA

Cooling

Most likely a naked FPGA of this size will get too hot in uses where lots of LUTs and FFs switch at high speed. So an set of cooling fins should be provided. At least make sure they can be user fitted (space for them, mounting?).

Preferably no ventilator as that makes noise. And even for 200MHz with todays FPGA transistor sizes and many static configuration memory transistors there should not be enough heat to require one (1993s 486DX/2-66s still ran at 66MHz without one).

Memory

Memory Word Width

For 32/36/64bit systems max n*36bit memory is needed. The extra 4 bits for 36bit can be (miss-)used in 32bit systems for other purposes, such as tagged memory (Lisp machine data types, memory management, debugging flags/traps).

This board can be used for serious workstations and servers, so memory words should be wide enough for at least parity, and better allow ECC.

Parity in 32/64bit systems requirs an 9th bit for every 8bits (gives same 36bits needed for 36bit designs. Ideal would be byte level parity with 4*9bit wide writes. But word-level writing is acceptable.

Parity in 36bit systems is most likely one bit for the entire word (or 1/2 or 1/4 word), giving 37/38/40bit words. Definitely word-level writing is acceptable.

ECC in 32bit or 36bit systems requires 8 bits more, depending on data bus width and algorithm. So 32bit needs 40bit and 36bit would require 44bit.

Memory Line Width

With todays fast processors memory has become the worst bottleneck in any system. Despite multi-level caches, Intel reports in its Hyperthreading documentation that a P4 runs actual work for about 5% of time! With the expected 100-200MHz of an FPGA-PC this problem will be smaller, but still existant. So wide memory lines for fast cache (re)loading is one thing that can be useful.

This is also ideal for cache killing large data set processing (servers). This can also be used to partially offset the slow clock frequency of FPGA-CPUs, at least in some applications.

And the load caused by on-board VGA will slow down the processor less on an wide data bus (shorter or less often stall time, and processor needs memory less often).

Single data bus wide memory, like on 368/486. This 32/36+4/8 would require 32/36/40/44bit memory. But is definitely too small
Dual data bus wide, like on Pentiums. This 2*(32/36+4) would require 64/72/72/80bit data. This would be absolute minimal memory
Quad data bus wide, like in the original KL-10 or the SGI O2 workstation and most likely the middle class Suns. And AMDs Opteron (server edition of the Hammer/Athlon64) is also going for this width, so this is standard for PC servers and high end PC. This 4*(32/36+4) would require 128/144/144/160bit data. There are definitely enough pins for quad width, so no reason for limiting the board to dual width. If using XC2S300E this is half the pins, so already too much
Octal data bus wide, like the SGI Visual Workstation and other top range systems and servers. But this 8*(32/36+4) would require 256/288/288/320bit data, with is by far the biggest pin count abuser in the entire design (320 bits is 63% of 510 user IO pins). Then add address and control signals, and then cache signals, and there is not enough pins left for all the IO wanted. This is also the biggest layout nightmare. This would require an 1000pin FPGA and most likely an 12 layer board, so forget it [Andrew: Octal width was a nice dream, while it lasted :-)]

So go for quad wide 128/144bit data memory, with 16bit for parity or ECC, giving 144/160bit on XC2S600E or half of that on XC2S300E. [Andrew: What are your intended Leon uses? Is this relevant for them? I would like it. One of my users comments that wanting this shows that I am a ex-mainframe (actually ex-minicomputer) guy]

Memory Slots

As this is intended as an large processor FPGA-PC board for PC style uses, it should also support an large address space, and of this offer large memory chips. So it needs to use (S)DRAM.

There are multiple variants:

One possibility is to use 168pin 80bit DIMM SDRAM slots. These fit 64, 72 and 80bit parts, of which the 64bit type can easily and cheaply be got at any PC shop, and the 72 and 80bit types are available at higher prices (Server ECC parts).
Memory bandwidth with PC133 SDRAMs is 133MHz*64/8*2 = 2128MByte/s (when ignoring dead time sending addresses. A 133MHz FPGA-CPU should not require more than 133MHz*32/8 = 532MByte/s (= 1/4) of instructions, plus data accesses (can be 1/2 of bus, so allows vector processors), plus any video and disk DMA. So these are fast enough. If XC2S300E and only dual width bus, this may be more limiting.
Signals are for 1 DIMM slot wide (dual data bus) DQ0-63 (64) + CB0-15 (16) + /WE (1) + /CAS (1) + DQMB0-7 (8) + /S0-3 (4) + /RAS (1) + A0-12 (13) + BA0-1 (2) + CK0-3 (4) + CKE0-1 (2) + /MWAIT (1) + /MIRQ (1) + SDA (1) + SCL (1) + SA0-2 (3) = 123 pins.
For 2 DIMM slots wide (quad data bus) duplicate DQ and CB and most likely S and CK also, adding 88 pins giving 211 pins (41% of 510, but 65% of 325). So that wiring is simpler, wire D pins alternatively to the 2 corresponding slots (which will most likely be arranged behind each other, to fit on the board).
As 123-88 is only 35 pins it may be sensible to spend 35 more pins towards 2 sets of independant 80bit wide memories (and less address bus and control signal fanout). (#lookup: what dual SDRAM channel server northbridge chips do, also what Opteron does for its dual channels)
Some users may want to also use this memory half separation for 2 independant memories, one for main memory and one for video memory. This 2 halves is slower because each half is monopolised and unused capacity wasted, but it is simpler than the more complicated interleaved CPU/DMA accesses to one double wide memory, where the full bandwidth can be optimally shared
Other possibility is to use 278pin 144bit DIMM SDRAM slots. These have double width relative to above, and so double memory bandwidth. But that is already using 144bit, where above can be paired as 2*64/72/80bit, catching up. So it only saves on slots/DIMMS, but requires larger ones.
Signals are assumed to be similar to SDRAM, just double data bus.
Memory bandwidth is identical with above, unless these may only be 100HMz or even 66MHz, in which case they would be slower.
But these are less flexible for parts (need quad word 144bit DIMMs, no cheap dual word 72bit, nor any non-parity/ECC 64bit or 128bit). Also have max 144bit (no 160bit), so ECC/parity is lost for 36bit systems. And also less easy part to get (not used for PCs), so these may vanish faster from the market.
So these are only inferior to 2*64/72/80bit, it is surprising they were ever made. Don't use these
Other possibility is to use 184pin 72bit DIMM DDR SDRAM slots. These fit 64 and 72bit parts, of which 64bit can easily and cheaply be got in any PC shop, and 72bit are available.
Signals are assumed to be similar to SDRAM, even though DIMM is 16 pins more. Exeption is missing 8 ECC pins. (#lookup: pinout)
Memory bandwidth with PC2100 is (2*133)MHz*64/8*2 = 4256MByte/s, with PC2600 is (2*166)MHz*64/8*2 = 5328MByte/s, with PC3200 now up at (2*200)MHz*64/8*2 = 6400MByte/s. This is far more than we need for any FPGA-CPU. But it may be useful for massive vector stuff, or for massive DMA, in particular video. Or for simply wasting with inefficient memory controller. For XC2S300E this addition will be quite attractive.
DDR comes at cost of slightly more complicated signalling, and only being able to load pairs of words. But cache reloads are seldom single words, otherwise waste one word :-).
DDR have the far more serious problem of only being max 72bit wide, so no (2*)80bit, so no ECC (and not even parity) on 36bit systems. OTOH the 80bit SDR SDRAMs are also very seldom (who uses them?). And one can take an 128bit memory word and use that as 3*36 = 108+ECC/parity. This costs 1/4 throughput, but any ECC or parity test costs time anyway, and the DDR is giving double troughput, so there remains 2*3/4=1.5 = +50%. Alternative put in 4 words of 4*36 and then an 5th word with all ECC data (byte level over 4 same position bytes?), then waste 3 words (todays memory chips are large), as this may be easier to address. [Andrew: This is the nearest case to an conflict Leon vs PDP-10 that I have found! For 32bit DDR is just more speed at little more work, for 36bit it makes ECC require +33% or +100% more memory instead of +11.1%, and additional work, and loses half of the gained speed. So not really bad either]

So it looks like DDR SDRAM is the way to go, as in all newer PCs, unless these are seriously more difficult, in which case normal SDRAM will do. But many newer prototyping boards have DDR, so it seems to be easy.

Possibly put in (banks*width) 2*2 = 4 slots. 3*2 = 6 is overkill, even modern medium level servers do not do that any more. For XC2S300E with one bank width, put in 2 or 3, not more. So there is no need for using the slower registered SDRAMs. Above pin counts are for 1 bank of DIMM slots. For 2 banks duplicate some of the control signals and most likely also S and CLK (which so grow with width and depth one set for each DIMM slot) (#lookup: which signals need duplicating).

Memory Cache

SDRAM is fairly slow to access, so cache it in SRAM.

The XC2S600E has 72 BRAMs (= 36kByte), usable for internal (L1) cache, data and tags, so that will allow max 32kByte size (so long no other use of BRAMs is intended!). This may be too small for an 100+MHz system, so the board may want to offer an external (L2) cache. The XC2S300E has got only 16 BRAMs (= 8kByte), so that tops out far lower.

OTOH PCs introduced L2 caches in the days of 486DX33es loading over an 33MHz*32/8 = 133MByte/s bus from FP DRAMs and with only 8kByte L1. In the days of 100-166MHz PCs these were 586-1xxes with 66MHz*64/8 = 528MByte/s bus from EDO DRAMs and 16kByte L1. Here we have 100-200MHz and memory up at 2128..6400MByte/s bus from SDRAM and can have up to 32k of L1.

And with an 4*4word cache reload granularity, that gives an setup wait of perhaps 4 to 8 clocks every 16 clocks, so only 1/5 to 1/3 slowdown when L1 fails and we need to load direct from SDRAM.

So perhaps save the cost (layout time, parts, board space and design complexity) of L2 cache. [Andrew: I don't believe that L2 cache is actually worth it. Better implement an good L1 cache. OTOH on an XC2S300E there is a lot less L1 space, and L2 on that is only half the width, so it may be worth it there]

If L2 cache does go in, there are 2 variants:

8bit wide SRAMs would require 9 chips just for 72bit and even 18 chips for 144bit. This is too much space and address wiring
So it will need to be 18bit or 36bit wide SyncSRAM or ZBT SRAM chips, which are anyway faster for block access (L1 cache loading from L2, or L2 loading from SDRAM).

I suggest using 8 256kx18 (gives 4MByte at 128bit used), which are common on PC motherboards. Possibly only 4 128kx36bit (gives 2MByte) chips, but this is inferior, so do it only if space or cost crunched. [Andrew: bad cache to memory size ratio is one reason PCs are inferior to workstations or real servers, so make the cache large if possible. And if we put in cache, make it large to get the maximum from it]

This gives 144bit, which allows byte granularity parity on 32/64bit systems (the reason these chips were made 18bit), but 36bit systems will have to go without cache parity or take bits from the tag bits (reducing cachable address space), which is not too bad if one uses word granularity parity (costs only 1bit, only halves address space). ECC is senseless in cache, is too slow, and reload from main memory in case of parity error.

L2 cache will have to share the 128/144/160bit data pins (SDRAM is only used when the cache has failled to return data, and the cache loads what SDRAM delivers). But so DMA from in-FPGA video or disk controllers will kill same-pin L2 accesses also.

Many PC processors use separate frontside (main SDRAM) and backside (L2 cache) buses. There are not enough pins for 144/160+144bit (see the 320bit discussion above), so this would go down to 72/80+80bit. That gives slower L2 to L1 reloads, and also slower SDRAM to L2. So separate frontside and backside is only good for PC processors where only the 64+8bit SDRAM to L2 load over few external pins of small PG370 chip cases and the 128bit L2 to L1 is inside the chip. With FG676 and no chip internal L2 just one 160bit bus is faster (and simpler).

Tag bits for an simple set associative cache need to be address bits of main memory minus address bits of cache, so 30|32|36 - 17|18 = 12..19bits. Plus a few for "row valid" bits. So use an 9th 256kx18 chip, which gives 18 tag/control bits. For tag writing while still reading ECC on memory to L2 loading provide extra tag data pins, do not share ECC pins (of 160bit memory if used). Also separate tag address (18 or 17) and control (few) pins. (#lookup: 256kx18 pinout, in particular control signals) Tag bits for n-way set associative cache is n times larger, so too expensive for such an large cache.

This L2 cache is also enough memory space for some smaller-memory designs to be possible with no DIMMs at all. It is simpler to use for such designs (but how much?). If they want 160bit width for 144bit+ECC, missuse the now unused tag bits for this (gives them 162bit).

This L2 cache can also be used as microcode store for an horizontally microcoded processor, if the BRAMs are not enough for this (but even the larger model B KL-10 had only 2kx96bit microcode, which is 48 BRAMs, so that would fit into 3/4 of the BRAMs). This can also use the full 162 cache+tag bits. On XC2S300E again this is more likely to be usefull.

Form Factor, Case and Power

Board Type

This board should be an ATX format, and use an standard ATX power supply. This is given by Andrews part of the spec. There are multiple possible ATX based board formats:

Most obvious possiblity is an standard fullsize ATX (30.5x24.4cm) board
But fullsize ATX uses lots of space, as it offers 7 slots spaces, for many cards. This stems from early PC days, where many slot cards were needed. But it also uses up lots of space, which is mostly/entirely wasted today, due to integration of the most important interfaces on board, and so less cards being used. This suggests using the smaller MicroATX (24.4x24.4cm, 4 slots) format, which has less board space and bus connectors and is so cheaper. And it fits all fullsize ATX cases, but also the smaller MicroATX ones
A FPGA-PC will also integrate all/most processor/chipset/IO stuff into one chip, and requires less auxillary stuff (power converters). As this board will have fewer parts, it needs only little space. So possibly use the even smaller and cheaper FlexATX (22.9x19.1cm, 3 slots) format, that can fit even smaller cases. But FlexATX may be too small, or at least more difficult to fit, in particular if L2 cache or FEP are included. And the large majority of cases fit MicroATX or even full ATX anyway
For the ultimate in saving space, one could also use the even smaller Mini-ITX (17x17cm, 1 slot) format designed by VIA and used for its EPIA boards. This format is presently really hip among minimal size PC self builders. But this format has only 1 single slot, so it is most likely too expansion limited for an single universal FPGA-PC board. My idea of an well equipped PC has 2nd(+3rd) VGA for multi-head, 2nd Ethernet for NAT router and SCSI card for fast disks, so it needs 3 Slots
Further possibility is putting all this on an PCI card and using an passive backplane. But a slot has less IO connector space than ATX motherboards have (only VGA+key+mouse+Ethernet or USB instead of mouse or Ethernet. Printer/userport and RS232 must go on an 2nd slot cover). And this makes separate wiring of some pins of the PCI connectors impossible. [Andrew: This was an user suggestion. I do not regard it as good, but am documenting it for completeness]

[Andrew: For me MicroATX or FlexATX are the preferrable sizes, depending on if space required. Slightly more preferred of these is FlexATX]

Power Supply

ATX supplies provide high-load 5V and 12V and lower-load 3.3V and a bit of small stuff (5V standby, -5V, -12V).

An on board voltage converter 5V to 1.8V (or 12V to 1.8V) is needed for VCCINT (FPGA core). Preferrably from 5V, leaving the 12V for drives. Do not use 3.3V to 1.8V, as 5V and 12V are on most newer ATX power supplies way more plentifull (about half of power each) and more stable (primary switching regulator input, 3.3V is often only auxillary power, just an simple linear regulator from 5V).

VCCO (IO) and memory 3.3V can be taken directly from the ATX power supply, it is good enough for that.

So this design should require only 12/5/3.3V parts. 1.8V only for VCCINT, else parts may interfere with the FPGA core operations, and they would require 1.8V availability away from the FPGA also.

The board should be capable of controlling the ATX power supply.

Front Pannels

There is no operators console / front panel (desirable for hardware debugging and teaching computing fundamentals) on an ATX case.

Either use an external "panel peripheral", connected to one of the IO connectors. Most likely connected to printer/user port
Or build an special case with panel. Connected to the printer port. Its DB25 connector will need to be shadowed by an internal IDC 26 for this
Or same case type as above. But better use the now unused ATX case LEDs/switches IDC connectors, no need for shadowing the DB25. There are only 3 out and 3 in pins here, but that should be enough

So no loss from chosing this format here. This even makes the front pannel facultative and user selectable, saving cost for those users who do not want one.

Use LEDs and switches. To fit the 17 or 6 IO pin count, this will have to be an scanned matrix arrangement. In the FPGA grab all data lines and shift them out, 1 bit at a time. And provide parallel to these data bits their column number to an external TTL or CPLD based column decoder. Or drop the number and use an external column counter and reset/clock signals to drive the decoders. Or use an row of FFs with an bit travelling along them. At same time read in the switches into an shift register. And then transfer them to action circuits when all in.

Expansion and Bus

Peripheral Component Interconnect (PCI) Bus Slots

Expansion for cards will be PCI bus/slot(s). This is also given by Andrews part of the spec.

PCI allows both complete commercial cards and prototyping cards (both for simple random circuits or FPGA based ones) for user designed additional devices.

PCI is ideal for cloning computers that have PCI busses, as it allows the actual PCI cards that these systems use. PCI is also decent for any 32bit computer, this just requires adding PCI to its design, and gives a lot of cards in return for including this, just write drivers for them.

Wide (64bit) PCI is most likely not important enough (only used in high performance servers) to justify its inclusion in this board. OTOH it may be an good alternative to the CDEC below.

Fast (66MHz) PCI is not important enough (only used in high performance servers) to justify its inclusion in this board. Also it requires 3.3V which may be incompatible with the PCI bus driver chips (would need switchable dual voltage operatable parts).

Signal wise PCI has 32 multiplexed D and A signals, plus quite a few control signals. Some controls seem to be individual-slot. (#lookup: PCI signals, what needs to be individual-slot and what can be all-slots) Make at least 2 of each slots 4 INT pins individual-slot, for decent interrupt designs, real plug and play.

For 36bit PCI cards can be used, simply wasting 4 bits of the IO data bus, like the original KS-10 which used 16bit Unibus PDP-11 cards. Put an 8*36 to 9*32 converter in to the DMA circuits in the FPGA.

AGP is an single slot PCI variant, which is only usable for video cards. There are no prototyping boards available to missuse it for any self built stuff. [Andrew: Or do you know of such boards?] As it is such a large amount of single-purpose pins (unless shared with an ISA connector based CDEC, see below), better provide on-board VGA.

Industry Standard Architecture (ISA) Bus Slots

As it is most likely needed to define pins for ISA bus (for driving ATA), possibly also provide some actual ISA connectors in the PCI slot space, as they mechanically can share an slot with PCI. This allows also using the plentifully available ISA cards, including data aquisition stuff, some of which does not exist in PCI.

Signal wise these can share pins with the PCI bus anyway, as an IO instruction will only address one of these buses at any one time. Dual bus requires 2 sets of bus drivers, so that the ISA stuff does not slow down PCI with its capacitive load. Appart from that it is then just connector and wiring cost.

Signal wise ISA has 16 D and 24 A, and various controls. All ISA signals are all-slots, none individual-slot. So no per-slot separate wiring for decent interrupt designs. (#lookup: ISA signals)

Put ISA D0-15 on PCI AD0-15 to save FPGA internal multiplexers on data lines. One possibility is to (miss-)use PCI AD16-31 for ISA A0-15 and ev some PCI controls for ISA A16-23 (if included). But better latch the ISA A0-23 lines from PCI AD0-23 onto external registers in the A bus drivers (which is what what happens on the PCI cards anyway), as this saves FPGA internal different multiplexing of PCI and ISA address signals (2 times address in different places) and is so easier to use.

It would actually be nice if the board offered all once often used PC/AT "legacy" interfaces, such as ISA, PS/2, RS232, LPT, floppy, as this makes the board (with an x86/PC FPGA configuration) usable as an legacy PC, when the industry kills off these interfaces in its zeal to cut dumb user support costs at the cost of power/laboratory users. This creates an additional, possibly large, market for the board.

32bit ISA variants such as EISA and Vesa VLB are not important enough to warrant supporting them. Anything requiring that power will have to go over an PCI slot (and exist for PCI).

PC/104 is just an ISA variant that uses exactly the same signalling/pins, and can even share the ISA bus drivers, is just an different (IDC) connector. So adding this is low cost, so it may be sensible to include it, to making embedded systems usage easier. Definitely no PC/104-Plus, which is the upgraded PCI over IDC version, as next to no cards need it (less than 10% can even use it).

Own Custom Expansion Bus Slots

PCI and ISA on an FPGA-PC really only fix the connector type, the positions of hardwired pins (such as power) and the slot mechanics. They dictate that most/all signal lines are the same FPGA pin for all slots. Problem may be with buffers and in what sets they are direction switchable. (#lookup: bus buffering interfering)

So one can use any own/custom bus signalling on the PCI or ISA signal pins, so long the bus buffers don't get in the way. This should be good enough for any bus signalling design. Then simply use commercial PCI or ISA prototyping cards (which don't fix data pins either) and implement whatever device/interface one wants, possibly using an 2nd FPGA on the card. [Andrew: One of my users even suggests that this may create an aftermarket for add-in boards]

PCI has 6 "reserved" lines, of which 4 can be used for 32bit to 36bit extension. This use loses existing card compatibility, but allows an true 36bit "PCI-36" bus. So this board should pin out all 6 of these (4 all-slot as addresses, 2 individual-slot for flexibility, or all individual-slot if enough pins).

Custom Direct Expansion Connector

Any bus using shared lines for all slots on most pins requires decoder circuits on the cards, not in the FPGA. This makes minimal IO cards more work than just simple analog converters and connectors. But so long as only one card is used, one can ignore this issue and (miss-)use an single PCI or ISA connector as an set of direct (apart from bus drivers) IO pins from the FPGA to that one card. This usage type is quite possible with ATX on-board IO and FG676 reducing slot card usage/necessity. With the XC2S300E the loss of 185 IOs (and only saving 88 or 123 on quad->dual memory) the CDEC as 2nd independant bus is most likely the second victim.

But above kills off all the PCI and ISA slots, which is quite an high price to pay for one simpler card. And the "dangling slots" may disturb high speed signals. Also any PCI or ISA bus drivers will interfere.

So possibly replace one single PCI connector space with an custom direct expansion connector (CDEC). This would have just an set of dedicated FPGA pins going to it, directly driven, no buffers in between, without any bus (other connectors) following it, to allow adding any random IO circuit
But these pins and their slot space would then only be usable for CDEC, else wasted. Wiring the pins as an AGP connector would allow using them for AGP video or for expansion, but no AGP prototyping cards exist for CDEC usage
Better wire the pins as an separate non-bus/non-buffer PCI connector, so it can be used either as CDEC, or as second one-slot PCI "bus"
Alternatively wire them as AGP slot and parallel to it an 2nd non-bus/-buffer ISA connector for CDEC style prototyping card. This variant is better as it offers AGP, but may be worse if ISA prototyping cards disappear before PCI ones
If this board ends up as Mini-ITX there is only one slot, so this issue does not arrise. Just put in 1 PCI+ISA, without any buffering. But Mini-ITX is inferior exactly because it is single slot

Preferrably do the CDEC on the first slot, for shortest double wiring and no layouting of many traces through the rest of slot space. Possibly use an different colour PCI (or ISA) connector for the CDEC (unless this drives up part costs too much).

User IO Devices

User IO is an near infinite field, so too specialised stuff will need to go over PCI or ISA. But all standard stuff should be on-board, so that operation is possible with no need for add-in cards (for lower cost, less space and power usage).

The board should have all IO that an typical PC-experienced user would want to use. It should offer all stuff that is built into an typical ATX board, unless there are good reasons to override this. Typical is assumed to be:

As connectors inside:

        PC case LEDs (3), switches (3) and speaker

As connectors on ATX area:

        Ether Mouse    Printer    Game
         USB
         USB   Key   RS232 RS232  Sound

All ATX area user accessible IO (PS/2, LPT), and ev all internal user wirable IO (LEDs&switches, speaker, so only 7 pins), that runs directly to the FPGA should have: anti-shorting resistors (against user misswiring) and over-/under-voltage protection (against ESD or user misswiring), to prevent the FPGA (and so the entire board) being destroyed.

IO with unsolderable/replacable components in between such as VGA (DACs), RS232 (drivers), Ethernet (chip), USB (chip), sound (DACs) can be without, if it adds too much work/cost. But preferred with (if no signal trouble, as may be at VGA speed). [Andrew: This is an user request, I second it] Also protection, so that they are hot pluggable without killing stuff. [Andrew: Annother user request]

PC case LEDs (Power/Turbo/HDD) & Switches (Turbo/Reset/Keylock)

Signals are simply 3 in and 3 out pins, running to an block of male IDC connectors. The LEDs and switches are in the case anyway, so provide this next to zero cost function. Can be used not just for their official functions, but also for any type of 6 single bit usages, such as using Turbo LED for network traffic, or even entirely strange stuff such as front pannel. [Andrew: One of my users commented this line with: "what is strange about a front-panel ;-)"]

PS/2 connectors

For console terminal keyboard and mouse. Signals are simply 2 bidirectional IO pins (data and clock) and 5V and GND pins. Are input when sending keys, outputs when PC resets/initialises device, or when setting keyboard LEDs. All FPGA prototyping boards with LVTTL capable chips seem to just wire these to 2 IO pins. (#lookup: what does PC do against collisions).

Very low cost, so no reason for putting in only 1. For these use one of the standard "vertically stacked" violet+green connector pairs.

VGA connector

For console terminal video monitor, or for PC/workstation type graphical video. Signals are 3 analog out channels and 2 out pins for H/V signalling and 3 in pins for monitor ID (these can be ignored, no one uses them today with multisync monitors). Blue HD15 connector. Increasing colour depth allows:

3*1bit: only for simplest stuff, 8 colour home computers or simplest terminals
3*2bit: enough for CGA or EGA (16 of 64) colours or C64 or VT240/340 terminals
3*3bit: 512 colour direct, also the minimal to use 256 palette
3*4bit: Amiga 4096 palette, and also minimal for the 6*6*6 = 216 colour "websafe" palette using the 6 levels 0,3,6,9,12,15
3*5bit: the maximum fitting in 16bit/pixel (known as "hicolor"), and also maximal fitting an single BRAM 256*16bit colour table, and also better for 2*8 ANSI + 216 6 level + 24 grayscale ramp palette or ideal for 8+216+32 palette
3*6bit: the original VGA 256 of 262144 colour table RAMDACs, also best "near true color" fitting in half an 36bit word
3*8bit: 256 level true colour, PC standard today
3*10bit: the maximum fitting 32bit, and allows 3 256*10bit tables for simple gamma correction. But this is of questionable value, unless someone wants to do calibrated image processing
3*12bit: the maximum fitting 36bit, and allows 3 256*12bit tables, only usable for stronger gamma correction. Very unlikely to be of any value, only finer calibrated image processing

This uses up 3*(2..12)+2 pins, but video output is essential to nearly any FPGA-PC design.

For the actual bits to signal DACs use one of (#lookup: available DAC chips):

Buffers-and-bunch-of-resistors DACs: but these may not work correctly above 4bit, so that stepping from 0111... to 1000... is not a voltage step down (needs very precise resistors, all with error smaller than LSB step), or above 100MHz (wire signal delays in self made DAC)?
Special Video-DACs: these are special chips for this, with the proper resistors in them. Also less wiring to lay out, and so less wiring trouble. But these may be difficult to get, and prone to availability problems with parts disappearing from the market
Premade RAMDAC chip: all advantages and disadvantages of Video-DACs. Were once easier available (used on IBM VGA), before video chip makers integrated them, today more likely to disappear. Allow 10 or 12bit with only 8bit pin usage. But the colour table needs programming, which costs pins. Though those may partially be sharable with the PCI/ISA bus (#lookup: what are RAMDEC chip pinout limits) and FPGA circuits to send it set up commands. So pure Video-DACs are preferrable, if they are still available
No DACs at all, send data digitally using DVI-D connector, let the monitor or an adapter do the conversion to analog. But this requires AFAIK 8 times pixel frequency bit rate. Spartan-IIE tops out at 625MBit/s, so this would limit to 625/8 = 78MHz pixels, too low
Use an commercial VGA chip, with all stuff in it, including using its own video memory. But this fixes video architecture, uses ATX space, and adds design work, while offering no advantage to an commercial video card on the PCI bus. And users may not want to use PC derived video card designs anyway. So forget this variant

VGA video out of the FPGA comes from main memory, so it is an UMA (unified memory architecture) system. But memory bandwidth is no problem here. The maximal 21" usable 1600x1200@70Hz gives (1600*5/4)*(1200+32)*70 = 172.48MHz pixels. At 24bit/pixel from minimal 128bit/cycle this gives max 172.48*24/128 = 32.34MHz memory. That is only 24.3% of PC133 SDR and 12.15% of PC2100 DDR and 8.1% of PC3200 DDR. It stays way under the 50% of memory bandwidth, that could kill off processor power. Note though that VGA memory access not only blocks main memory, but also the L2 cache, which uses the same data bus pins, so the 50% limit may be optimistic. [Andrew: I regard the 50% as realistic, as the processor only needs 25% of PC133 SDR wenn running worst-case cache missing code]

An alternative is to only provide an small video section (up to 4 or 5bit and <100MHZ) and use an premade video card (or the CDEC) for anything more. Problem is that PCI video card supply is not safe in todays AGP dominated video world, and AGP video cards require an dedicated AGP slot which is only good for them (unless paralleled with an ISA based CDEC). And losing CDEC for just getting VGA is not good either.

VGA is not in the list of "typical" ATX stuff above. So this will require dropping other connectors. Game is the most likely, sound also possible. Preferrably not one of the RS232 ports.

2nd VGA connector

An 2nd VGA connector would be good for dual headed display, or showing an operators console / front panel on an 2nd monitor (instead of special panel hardware), or even together with (miss-)using the mouse PS/2 for an 2nd keyboard to make 2 terminals (one can then still use 2 RS232 mice for the 2 terminals with full KVM user IO). [Andrew: I would like dual video. But this is quite specialised stuff, perhaps too much. One user would like tripple video (has multiple smaller monitors), and I would not mind that either (I have got 3 monitors)]

This costs an second full set of 3*(2..12)+2 pins and the stuff behind them. Because of the high transfer rate they are not sharable. RAMDAC programming pins, if needed, are sharable at least. Memory bandwidth in now also doubled, to max 48.6% of PC133 SDR or 34.3% of PC2100 DDR or 16.2% of PC3200 DDR. And there needs to be ATX space for 2 VGA connectors.

Alternative would be to use the CDEC if users want 2nd VGA. If making such an expansion board (not in base board price any more), it should provide 2 (2nd/3rd) VGA connectors on it. This means the CDEC connector will need at least 2 times 3*(2..12)+2 pins (or more if RAMDACs, but these 2 RAMDACs can share most programming pins) on it, which is the case if it is an "2nd PCI bus" or an "ISA connector CDEC parallelling AGP". [Andrew: This looks like the better version. Single VGA as standard, 2nd/3rd as extension for those users that want them]

PC Beep Speaker

For terminal bell/beep. Signals are simply 1 out pin with simple transistor amplifier via LEDs&switches IDC to speaker. The speaker is in the case anyway, so providing this function is next to zero cost.

RS232 Serial Interface(s)

For external terminal(s) for system console or user. Or for talking to any random RS232 device. Also use for simulating paper tape reader/puncher with an PC at other end or even for real PTR/PTP devices.

Signals are simply 2 data lines and GND pin. Strongly preferred not just 3-wire, but with full pinout, with control lines (= 3 outs, 5 ins, so this only uses 8 pins), so that modems and RS232 mice can work. Drive data and control lines by 1488+1489 or MAX232/MAX233 or similar. Using an commercial U(S)ART chip is not worth its cost, and fixes register interface making historical cloning difficult. For this use the standard turquoise/cyan female DE9 connectors (DB25 uses too much connector space, converters exist).

Is an 2nd RS232 useful? Today with PS/2 mice, and switching to ADSL/CATV Internet the RS232 interfaces are less and less being used. OTOH, even newest commercial motherboards still have 2 RS232 on them (unless they have on-board VGA displacing one) and RS232 uses only an small connector and 8 IO pins. And the historic computing people want RS232 a lot. So put 2nd in if space and pins allow. [Andrew: Actual user request that 2 go in. I second it. I would prefer the on-board VGA to displace the standard ATX game or sound connectors, so that 2 RS232 can stay]

LPT/Printer Port/General User Port

For printing. Signals are simply 17 straight IO pins and 8 GND pins. For this use one of the standard red "printer up top, above RS232/VGA" DB25 connectors (36pin is way outdated, converters exist).

This is also (miss-)usable as an general user port (see parallel port disk drives). This is actually an very universal port, because no special circuits (not even buffers) behind it, just 17 pure (LV)TTL signals direct from the FPGA. It is sometimes also called the geek port for this.

For an general user port it lacks power supply pins, so possibly make some of the pins switchable (FPGA-driven relay or power transistors or just DIP switches?) to become power pins. I suggest 1*12V + 2*5V + 2*3.3V. [Andrew: 2 users have commented on this feature being a great idea, both users that definitely want to buy the board, so it needs to go in]

Using 5 of the 8 GND pins saves losing IO pins, but risks shorting by plugging in a standard LPT cable, if it combines all 8 GNDs (a PC power supply should just switch off). Using some of the IO pins can not short and so not crash the FPGA-PC via power faillure, but loses them as IOs. [Andrew: I prefer using GND pins, as IOs are too valuable, that is also the reason for 1+2+2 power pins, so that 3 GND pins remain usable]

[Andrew: Such an user port may also be an good alternative for your PDQ8 and Minibus ideas. It has 17 data lines (one less than 18), without wasting another 17 pins on duplicates (more for the PDQ8s local IO). Comes on an DB25 male plug, ideal for an long 25pin RS232 cable to an DB25 female on the (first) target peripheral, with there an 2nd DB25 male for further targets (like external SCSI cabling, but smaller cables/plugs). And then just an DB25 female plug with terminators in it (if the bus needs terminating) so no SCSI like trouble. And it can even provide power for small peripherals, so saving power wiring mess]

Additional LPT/Printer Port/General User Ports

Given the universality of such an user port, possibly put in multiple of then. These are not part of typical ATX, so sacrifice other connectors:

2nd should be possible from pins. For ATX connector space put it in the normal DB15 game connector space. As the red printer port connector is also an "up top" one like the game one, this works. Width in the ATX connector area should be enough if reduced spacing in between
3rd put in an user port (17 pins) instead of the 2 RS232 (2*8 = 16 pins). This would make the board more universal from the FPGA point of view. But requires an external plug-on module for converting to RS232 (which could then be used twice for 4 RS232). And is less PC-like for normal users. So perhaps not. [Andrew: I am about balanced on 3rd printer/user vs built-in RS232 variants, both have their attractivenesses and drawbacks]
4th could be using the sound ATX connector space. But this means VGA by plug-on module, limited to 17 pins, so max (17-2)/3 = 5bit. Also any anti-short/ESD stuff may collide with the high video signal frequencies. This would only allow simple 5bit@<100MHz VGA and require using the CDEC for full resolution/depth VGA. [Andrew: I am strongly against this variant. It is most flexible, but reduces VGA ability strongly, requiring an extra VGA module (or even CDEC card for any serious video). Is not desirable, as I and many others want VGA, preferrably full power]
4th alternatively could be using the sound ATX connector space and then place VGA above horizontal keyboard and mouse connectors. But this would lose space for Ethernet and USB. This would force those onto user port modules, or to use the CDEC or to use PCI cards for them. Seems to be the better 4-port version, but also very non-PC-ish. [Andrew: I am quite strongly against this variant. It is more flexible, and less limited than above, but it requires Ethernet and USB modules (if not using PCI cards anyway), is so less desirable, as I and others want Ethernet or even USB]

[Andrew: At the moment the 2 user port version, sacrificing the game port for 2nd, VGA taking the sound ports, seems to be the best]

Real Time Clock (RTC)

For setting the system clock at boot time, as any computer designed in the past 15 years can do.

For implementing an PC/AT compatible design (how many will be doing this?) it is slightly preferable to use the real part, or an clone of it. But this is not neccessary, as one can emulate this chip and set it from something else
For any other design it does not matter what is used. But the PC design has an Y2K problem which is annoying. So preferrably don't use that chip, or any compatible
Perhaps a simple 32786Hz watch quarz driven 64bit counter may be the best thing. But preferrable an single clock chip

[Andrew: look around what exists in battery powered clock or time counter chips, I really do not know enough about the available parts here]

Signals, if it needs separate data/address pins connect A to the ISA A bus, after its bus driver chip (and so decoded address), if common A/D connect it direct to the common PCI/ISA FPGA pins, data anyway connect there, separate just chip select and alarm interrupt pins. [Andrew: There exist quite a few I2C (only 2 pins) based clock devices. But I2C is patented, which would require organising and paying per-copy license costs, either by any person implementing the FPGA-side I2C driver in their design (impossible for open source, as no per-copy money to pay it with) or by you as some form of "cover all uses" license. As no other I2C using chips (temperature or voltage sensors) are being used, this makes I2C not worth its costs, unless they are very low]

For simplicity power the RTC from an replacable Lithium battery, no rechargables and charging circuits.

Battery backed CMOS RAM

For storing setup data. The RTC chip CMOS RAM is limited in size (usually 128-512bytes). Even in newer PCs additional CMOS (extended setup) had to be added, because the RTC CMOS was too small. Macs have large ROM patch CMOSes.

Such an large battery backed CMOS SRAM can also be used to pretend that it is an ROM or flash ROM, for designs that want to see such an chip for firmware, be that BIOSen (as in PCs) or operating systems in ROM (as in older Macs or Amigas or Atari ST) or comfortable boot monitors (as in newer Macs, or in Suns and Vaxen).

So put in an larger battery backed CMOS SRAM.

At least 1MBit (128kx8), less (= 32kx8) is not worth it
Better go for the larger/largest(?) 512kx8 (4MBit) as that is not much more cost, and 4 times more space. Same logic as for using the largest FPGA
If we already have 256kx18 SyncSRAMs (= 4.5MBit) on our parts list for the L2 cache, use an further one of them here, if reducing part list size saves more cost than the larger chip case costs extra. This chip has the needed signals to be (miss-)used as (256k*2)*9, if small width is wanted. But better make it full 18bit width anyway, as that is faster to access
Even if we have no L2 cache, possibly use an 256kx18 anyway, as they offer 36bit designs an full 1/2 word wide CMOS, instead of packing 1/6 words into 6 times 6 bits of an 8bit chip, and they can be used by 32/64bit designs as 16bit. So this is better to use one here, if not too much more expensive
If using 256kx18, possibly put in 2 of them for full 32/36bit wide CMOS RAM, no need to do 2*16/18bit access. If no DIMMs inserted (and no L2 cache), this wide CMOS could also be used for very small designs main memory, as it offers 1MByte/256kWord of SRAM
Or instead of 2 256kx18 use an 128kx36, which gives full width from one chip. But this has an larger case, is more expensive, only half the data space, so better use 2 256kx18 chips
If SyncSRAMs are not wanted at all, put in 3 (for 16/18bit, actually 24bit) or 5 (for 32/36bit, actually 40bit) 512kx8 chips. This makes 1.5 or even 2.5MByte SRAM. But it costs 3 or 5 chips (but cheaper chips?), and space for them (but smaller chips?)
Alternative to this would be to simply battery back the L2 cache memory. But this then gives collision between the small amount of setup/firmware and large space and fast caching algorithms, which may lose 1/2 of cache size, most of it wasted. And L2 with its many 256kx18 chips is quite likely out of the design anyway

Signals needs separate data/address, so connect A to the ISA bus, after its bus driver chip, data direct to PCI/ISA FPGA pins, separate just chip select pins. But better wire the CMOS parallel to the main memory data and adresses, gives no collisions with IO (good for CMOS as main memory), but this may require multiplexers that slow down DIMM memory accesses.

For simplicity power the CMOS from the same source as the RTC.

Sound

This would have to be CD quality (2*44.1kHz*16bit) to make it worth doing. Circuits just for phone quality 8kHz*8bit are definitely not worth it. And for simple mono sound effects one can use the already provided PC beep speaker. [Andrew: Are your users expected to want sound? One of mine wants to listen to music, is thinking of making an noise-less (no disk and no ventilators) bedroom .mp3/.ogg/.shn box, with an cron job as alarm clock replacement!]

Standard PC usage today seems to be line-out DAC (light green), line-in ADC (light blue) and mic-in (pink) jacks. So in addition to DAC+ADC perhaps also put in an mic amplifier and analog switch before the ADC.

Put in analog stuff for 2 delta-sigma DACs, buffer + R + R//C + C, rest done in the FPGA. For ADCs need to use DAC+comparisator in form of an op-amp. Like RS232 this only needs analog components, no fixing of function, but it takes the most FPGA work
Easiest would be to use one of the special serial driven 2*44.1kHz*16bit (or 20bit) audio DAC/ADC codec chips. They cost less than 10 pins and little space/work, and also do not fix the sound register interface, and may be the best sound quality anyway
Put in an commercial sound chip. But this costs most, is most design work and fixes register set, without any advantage over an PCI sound card. Same situation as commercial VGA chip. So is worst variant
Possibly leave sound away, to use its ATX connector space to provide the more useful on-board VGA and still have full 2 RS232 and be able to use the game port space for 2nd printer/user port. If sound is wanted, put it into an plug-on module on the 2nd user port (which is more FPGA-ish), or use an PCI sound card, or even one of the newish USB sound modules. But this, like RS232 on module, if less PC-ish

[Andrew: Sound is the place where I am the least decided whether it should go in or not. This has been put into the "Rejected" section twice, and retrieved again from there both times. I am slightly against it, preferring the more FPGA-ish 2nd printer/user port variant]

Ethernet

Good for network/remote connections, can be used just for an full NIC interface, or for simulated RS232 terminal network logins on old non-Ethernet systems. Also today the increasingly preferred way to the Internet, needed for ADSL or CATV but also often used for ISDN or even POTS. Can also be used to connect Ethernet-accessible peripherals (printers, disks, custom stuff). Ethernet was invented as "the successor to RS232".

This is quite a specialised interface, hardware only usable for networking, And not every user will use this, but many will do: workstation users, X terminals, stationary MP3 players. [Andrew: Do you intend your Leons to be networked? I want it, for workstation and server usage. Above sound user wants it to fetch his music. An other user wants it to connect real historic tape drives with small Ethernet/microcontroller/controller combos in them!]

Definitely 10Mbit/s, but also 100Mbit/s as the chips for this are not much more expensive. Gigabit Ethernet is too exotic for this board, if it is wanted for an server use an PCI card. The old 10base2 Ethernet, or even worse AUI/10base5, is today seldom to bother, if required, use PCI or ISA card or external converters.

PHY Ethernet: Allows making any Ethernet design, fitting it into whatever system one is making. This is analog to just 148x or MAX23x drivers for RS232 ports. This allows exact register compatible cloning of historic Ethernet devices, for using old drivers. Signals require 15-20 dedicated pins to the PHY chip. (#lookup: PHY chips and their signalling) But this results in quite a bit of work to use it, OK in student boards, less so for an FPGA-PC. OTOH this is one-time design work, and there exist cores for it, and it gives an optimal result. Also anyone not wanting to design an Ethernet can use an PCI card
Full Ethernet: Saves the time in implementing an Ethernet controller. Signals a full chip can share data and address pins with the PCI bus, as if it were an PCI card, as it will be accessed on normal IO cycles, so the PCI bus is not being used otherwise. Only needs chip select and interrupt IOs. Preferrably use an chip which is documented by FreeBSD or NetBSD (or best both) code, so that drivers from there can be ported to other OSes without GPL trouble. But also known to Linux, for simple Linux ports to FPGA-PCs. Experimenters and register compatible cloners can use the CDEC or an printer/user port module with an PHY chip
No Ethernet: Only use an commercial PCI card, or the CDEC or an user port module with an PHY chip, if Ethernet is wanted

[Andrew: I prefer the PHY Ethernet version. As it allows both PHY (on board) and full (on PCI card) Ethernet variants, without users needing to do soldering. Full on board adds work and cost to the board without much real advantage (PCI cards are cheap and plentifull, so only space saving) and loses PHY without soldering (unless a pre-made module is offered). No Ethernet also loses PHY without soldering, and needs PCI, and does not save much, so no gain either, is worst variant]

2 or 3 (or even more, slot hole fits about 6) Ethernet interfaces would allow making routers or bridges or even switches. This would make an noiseless (runs 24h!) router. But this is very specialised, just like 2nd/3rd VGA connectors. So put in only 1 interface and do others as CDEC or PCI. [Andrew: One user is thinking along these lines (router)]

Universal Serial Bus (USB)

Used by quite a few newer low speed peripherals. Can replace PS/2 for keyboards and mice, RS232 for modems, LPT for printers, dedicated sound and even support many non-traditonal devices, such as connections to cameras, flash or harddisk drives, smartcard readers, etc.

But like Ethernet this is an specialised interface. Requires an specialised chip, and connector can only be used for USB, not anything else. But also some/many users will want this. [Andrew: I presently do not use any USB. One of my users commented that I have too many devices, and that he regards sound, Ethernet and USB as candidates for removing. An other users wants USB (and suggests even dropping PS/2 and just using USB). I think that PS/2 and USB should go in, PS/2 being easier and cheap, USB doing lots]

USB 2.0 seems to have no use outside of fast flash/disk interfaces. OTOH USB 2.0 is most likely only insignificantly more expensive, so put this in, unless it is too much more expensive.

Put in 2 ports, as many uses for them. Together with Ethernet use one of the standard "Ethernet above 2 USB" connector sets. If the USB chip offers 4 ports put the other 2 USBs on internal connector(s), for wiring them to the case front connectors increasingly common these days. (#lookup: what internal connectors are standard for this)

PHY USB: Unlike Ethernet there exist no historic USB (or other user IO bus) interfaces anyone would wish to clone. Signals require dedicated pins to the PHY chip. And PHY chips are AFAIK single port, requiring multiple of them, driving up cost. Is complex for the user to implement from PHY chip upwards. Like PHY Ethernet, is OK for student boards, less so for FPGA-PCs. Also uses a lot of space, there exist claims of 10% of an XCV800 for an USB controller
Full USB: Use an controller chip, as used on PCI USB cards. A full chip can share A/D pins with PCI bus, only chip select and interrupt separate. Also preferrably use an chip with FreeBSD or NetBSD (or best both) and Linux drivers. Experimenters can still implement their own PHY based USB interface on the CDEC or more likely an printer/user port, if they really want to do that (unlikely)
No USB: Possibly save a small amout of work, at cost of needing an PCI USB card (is supply safe?) for those users that want it

[Andrew: Here I strongly prefer the full USB version. Only none if also no Ethernet and going for "4 printer/user port with still VGA" design (which is unlikely). The PHY USB version is IMHO inferior on all counts, user will need to solder their own module if they want this unlikely configuration]

Rejected Interfaces

Analog (DB15, PC) Game/Joystick/MIDI connector(s): Users that want joysticks are seldom used and their users can use USB devices. Also DB15 joysticks interfaces are simple. Signals just a few pins and a bit of analog stuff, and can be implemented as an user port plug-on module, by anyone who wants one.

But better alternative is to use the limited ATX connector space to provide the far more useful and flexible combination of on-board VGA and 2 RS232 and still having 2 printer/user ports. Use the "above sound" space for an 2nd "on top" printer/user port connector. [Andrew: What is the exact target audience for your Leons? Do they game? 2 printer/user ports and full 2 RS232 and an-board VGA are all way more important than joysticks or sound for me, and my users]

Digital (DB9, Atari/C64) Joystick connector(s): Users that want digital joysticks can missuse the printer port (without module), as these are only 5 switches and open/closed input pins. This is the normal way on an PC anyway (see the Linux joystick driver "multisystem" joystick driver for the wiring).

S/P-DIF digital sound out/in connectors: Normal PCs don't have these either. Also I do not know what circuits/signals these require, as I have never seen one. We do not have ATX connector space for them. And there exist PCI cards, which can be used. And this also can be done as an printer/user port plug-on module. [Andrew: Are serious sound processors to be expected to be among your users? Mine not]

MIDI audio control connectors: Would be part of the DB15 game/joystick/MIDI connector on an PC. Signals are a few 3-wire RS232es as far as I know (never done MIDI). PC usage actually requires an external "break out" cable (as the game DB15 is shared with the joystick) as far as I know. This can also be done by missusing an user port connector.

Oscilloscope display output: These were used in computers before 1970s. Have a program cyclically plotting the picture, no video memory (then expensive) to be scanned out. Signals X/Y coordinate and Z or ZR/ZG/ZB intensity. Simplest is to missuse existing VGA connectors RG as 8bit X/Y and B an 8bit Z. But this gives only 256x256 positions and B/W 256 gray levels. If 3*10..12bit VGA the positions go up, but colours stay B/W only. For full 256^3 colours use VGA RGB for ZR/ZG/ZB and then add 2 separate 10..12bit X/Y via the CDEC or an printer/user port module.

Composite or SVHS video output: This would be for standard TV/video monitor output. Users that want any of these will have to use an PCI card, with also the needed analog stuff and video modulator on it. Same also for any video input digitiser and tuner. Or this also can be done as an user port plug-on module.

IEEE1384 (Firewire): Today only of importance for digital video in/out. Possibly also for fast external disks, like USB2. If someone wants this specialised feature then use the PCI bus and put a card in there.

IEEE488/IEC625 connector or other industrial stuff: Used too seldom for standard users. Use an PCI (or is ISA needed for this old stuff?) card if one wants any of these, or use the CDEC to directly drive such an system. Alternatively use an user port plug-on module (are 8 shielded data and 8 control signals IIRC).

Modem or ISDN or other comms equipment: Very specialised, badly documented signalling, regulatory problems. Existing devices are plenty and good enough, and can be connected by existing RS232 or Ethernet.

Storage Devices

Storage devices are even an worse infinite field of designs than user IO, in particular for people cloning historical systems. So the board should just offer an small and sensible set, usable in many designs. For all the rest use the PCI bus, an printer/user port or use the CDEC.

Floppy connector

Still the standard drive for small amounts of user data, low volume install and backup, and cheap media for giving away. This is also usable to load an FPGA-CPUs microcode (and that is what floppies were invented for), but that is only needed for microcode in external memory, as internal BRAM memory is loaded together with the FPGA configuration.

Also use the floppy for simulating DECtape or even paper tape reader/puncher or similar small media.

Signals are simply 17 straight IO pins (2,4,..,34) and 17 GND pins (1,3,..,33) with a little bit of analog stuff. Specialised FDC chips are not advisable, as they fix data format, bad for GCR or non-2^n bit systems.

40pin ATA connector(s)

For ATA disks or ATAPI CD/tape drives. Cheaper drives than SCSI. [Andrew: important for some users, including some of mine] Also possibly more similar to some old controllers, easier to clone them at register level. And is of course what PCs and many others today expect.

Signals not investigated yet, but less than 40. (#lookup: ATA signals, after looking at ISA) Sensibly put in 2 ATA connectors, for 2*2 drives, additional signals are most likely just 2 separate select pins.

The original ATA was just an set of bus driver chips coming from the ISA bus, ATA just being an ISA bus subset. So it can share FPGA pins with the ISA bus slots.

Later for higher speed moved to PCI bus with address latch and DMA with cycle combining (2*16 to 32bit). So it can share FPGA pins with the PCI bus slots, preferrably still same layout as ISA as it needs separate address lines and only 16 data lines.

ATA needs separate set of bus drivers, so that ISA and PCI are not hindered by the long ATA cable, so PCI/ISA/ATA need 3 driver sets. Also clocking at ISA 4MHz limits bandwidth to 2*4 = 8MByte/s (original ATA), and at PCI 33MHz limits to 2*33 = 66MByte/s (ATA66) speed operation, so driving ATA at higher clock to get up to full ATA133 also requires separate bus drivers.

Also provide control signals (if any) so that UDMA mode is possible, as ATA is really slow without this (PIO mode). (#lookup: what does UDMA use, is it just PIO with ATA adapter doing the reads, or special DMA control signals)

Serial ATA (SATA) is too new and not widespread enough to make it worth including. Also SATAs announced 150MByte/s transfer rate requires 1200MBit/s and Virtex-E/Spartan-IIE FPGA IOs top out at 625MBit/s. SATA will require Virtex-IIpro style RocketIO to implement, or using an dedicated SATA chip. Better use an PCI card if this is required.

68pin SCSI-W connector

For disks, CD/tape drives and nearly everything else. Allows up to 15 devices, and high-power (but also high-price) ones, and better for servers.

Signals are AFAIK 1/2 of the 68, so 34, the other half being GND or inverted signal. (#lookup: SCSI signals and the various standards)

Up to SCSI-UW there were 2 incompatible single ended (SE) and differential signallings. SCSI-U2W and further up have an 3rd LVD signalling. The oldest 8bit narrow SCSI offers no advantage (and there exist 16 to 8bit converters anyway). [Andrew: This is a mess. Best provide switchable the more common slow SE and the fast LVD, as most modern cards seem to do]

A SCSI card on PCI bus would block that bus during transfers, so this can also share data pins with PCI, ISA and ATA, just needs its own control signals. The same caveats about bus speeds as with ATA apply, and already hitting from 2*40 (SCSI-U2W) operation on. Needs high power bus drivers different to ATA because of driving terminators. (#lookup: SCSI driver chips) So as SCSI also needs its own bus drivers, we need one driver set for each PCI/ISA/ATA/SCSI bus, using common data/address bus FPGA pins.

If an external SCSI-W connector is wanted, run internal cable to an slot. Either second end of cable, which allows hardwired terminator resistors on board, or far better extended first end, which requires removable or better automatic switchable terminator resistors. (#lookup: terminators in driver chips? Tekram DC-390U3W has 3 UCC5930AMWP chips on LVD/SE and 3 DS21T07S on SE, later is by Dallas and are 9bit active SE terminators, former UCC unknown but Dallas has some UCC chips in its "replacement" list)

[Andrew: I know you like SCSI, like I do. But what about doing here? Are you aiming at client or server or embedded or multiple of these uses? What users will want the quite considerable cost of multi-drivers and support stuff? SCSI today is minority stuff. Perhaps make SCSI only available via an CDEC plug-in board (like 2nd/3rd VGA or Ethenet)? Or better go for an commercial PCI SCSI card (leaves CDEC for 2nd/3rd VGA), which is the least work (none), and no cost for the majority of ATA users]

2nd independant ATA and/or SCSI-W buses(s)

With 2 separate sets of bus pins and drivers, good for RAID 1 arrays (disk mirroring) with bus mirroring (else bus contention on writes loses speed). [Andrew: Are your users interested in high power redundant servers?]

For ATA this gives 2*(2*2) drives, for SCSI 2*15. Costs 2nd full set of independant pins for the 2 buses, drivers and connectors (and SCSI also termination).

Have this second set of ATA/SCSI drivers+connectors independant of PCI/ISA pins and the first set sharing PCI/ISA pins (slower because DMA clashes with other IO operations, or fast if PCI/ISA not used). This variant uses the least pins
Possibly put half of PCI/ISA slots on each ATA/SCSI pin set. Gets dual PCI/ATA buses out of same data pins. But this costs duplicate control pins and bus drivers and complicated layouting for the dual PCI/ISA. And both buses then have DMA/IO clashes, so less usefulness. So is not worth doing
Alternative, if putting in the CDEC, and running low on pins, put first ATA/SCSI on main PCI/ISA bus, and then the second set parallel to the CDEC. Buffers only on the ATA and SCSI connectors, none on the CDEC (would defeat its purpose!), so only use either as ATA/SCSI or as 2nd PCI "bus" or as CDEC. This variant is most efficient on pins for its amount of functionality
Better alternative, if putting in the CDEC, put in only the CDEC and then offer the second ATA/SCSI buffers and connectors on an expansion card for the CDEC. Or on different expansion cards, one each for ATA or SCSI. This would give CDEC choice of 2nd/3rd VGA for workstations or 2nd/3rd Ethernet for bridges/routers or 2nd/3rd 2*ATA or 2nd/3rd SCSI for servers or other expansion for lab/control/historic, or 2nd PCI "bus". This variant gives lower basic board cost, and best "symetrical" version
Possibly, if putting in the CDEC, put also the first SCSI on it, making that an dual channel board. Only ATA sharing PCI/ISA in base system (enough for workstations) and then the dual SCSI CDEC as addition for servers. No second ATA on CDEC, or an separate dual or quad ATA CDEC card. Maximally flexible at lowest base cost, but requires dual SCSI expansion card, even if just one SCSI wanted, but most users will not miss it
Put all SCSI on PCI instead of making an CDEC card, and leave only 1 ATA on board. Anything more is PCI (or CDEC) territory. Same cost as above, but less work (no CDEC to make)

[Andrew: I prefer the 2nd/3rd ATA or SCSI using the CDEC, not on board, as it is too specialised, like 2nd/3rd VGA or Ethernet. Have 1 each of VGA/Ethernet/ATA on board, CDEC for 2nd/3rd of one type. No SCSI on board (use PCI) and so CDEC still free for 2nd/3rd VGA is also OK]

Rejected Interfaces

MFM, RLL, SMD, ESDI or similar direct HD interfaces: These are only interesting for implementing old controllers down to the register set and sector format level. Would be the hard disk equivalent to how PS/2, VGA, RS232, LPT, Ethernet and floppy are implemented (just analog stuff, rest all in FPGA). And actually quite similar to floppy to implement. But such drives are difficult to get (only antique ones) and slow (factor 100 below todays), so ATA or SCSI is a lot better. If wanted, use an old ISA based controller card (no PCI cards for this anyway). Or use the CDEC connector for an own card. Or even some remote controller via printer/user port, RS232, or even USB or Ethernet.

Any form of direct tape or cassette interface: Too specialised, even more so than direct HD interfaces. Use ISA, printer/user port, CDEC, etc.

PCMCIA or CompactFlash (CF) slots: Used for memory cards or flash disks or microdrives. Would allow an fully solid state system, compact and noiseless operation, good for embedded systems (board with just CF flash disk and small power supply, for full IO capable X terminal or stationary MP3 player). These are basically yet annother ISA bus variant with an different plug.

Offering only PCMCIA requires an adapter card for putting in CF cards, while offering only CF disallows PCMCIA cards entirely (but CF is today far wider used) and offering both adds cost and layouting work for 2 seldom used sockets.

But it is a complex and costly connector, particularly if one uses one of the ejectable ones. And connector plus slot space takes up quite a bit of room/boardspace. Commercial motherboards do not have them, they are mainly notebook stuff. And there do exist ISA or PCI PCMCIA adapter cards, even ATA ones. Also there exist USB adapters for PCMCIA or CF. And embedded systems can use ATA flash disks or an USB "flash plug".

SSFD/SmartMedia or MemoryStick or Smartcards: No importance outside of data transfer to/from portable devices (such as cameras, MP3 players) or authentification stuff. Use RS232 or USB transfer. Or an USB based card reader, if the device has no interfaces. Or an PCI PCMCIA adapter with then an adapter for them.

FibreChannel or SSA or Firewire or any other fast serial device bus: While these are nice, they are difficult to do. And they have bit rates the FPGA IO can not do, the same case as against SATA. If Firewire (most likely the largest demand) were to go in, it would have to be an specialised chip, like for USB. But unlike USB here PCI cards are easy to get and will stay so for a while, so no reason to design one in. And they are unlikely to be used, as they are high end stuff. If any of these is wanted, use PCI cards.

Configuring the FPGA

External Configuring from Development System

Like any normal prototyping board this board shall offer direct configuring from an development systems parallel port.

The programming circuit should be built in (like in QST0201, good feature) with just an connector for the cable to the development system. Ideally the same (or expanded) printer port pinout and signalling as QST0201, to reuse the same configuration/control software.

But the config connector should not be in the ATX connector area, to save scarce space there for user IO ports, and make the FPGA-ness invisible to non-developer users and admins. So the connector should be internal, direct on the board. Preferrably use an 26pin IDC header like that used for pre-ATX LPT slot mount ports for on-board printer ports, not DB25.

If possible from the printer port pin usage, the programming circuit interface should offer support for debuging of the large FPGA and the complex designs it allows. See the software/debug clock 3 signal section above. Ditto remote-forcing the M0-2 mode to "slave" and programming the clock 0 frequency would be good. [Andrew: What capabilities does the QST0201 circuit actually have in this respect, what printer port pins are used up? Same as in the Xilinx Parallel Cable III?]

For seeing configuration happen, put in green "done" and red "error" LEDs. And also parallel them with IDC connectors for missusing the PC case Power and Turbo LEDs for this (these are separate from the "Power/Turbo as output of the user design" IDC connectors, users can plug the LEDs for their preferred function).

Internal Configuring from EEPROM

Once the user has an running design, he will want to run it independantly of any development system. This requires an board-local FPGA config mechanism. Simplest is an config EEPROM. For XC2S600E the 4MBit XC18V04 is the right part, for XC2S300E the 2Mbit XC18V02 is right.

For switching external/internal config put FPGA M0-2 pins on DIP switches, like on QST0201, as fumbling and losing jumpers is annoying. Make clock 0 also settable by DIP switches (or use clock chip with internal EEPROM?). Switches should be socketed (DIP socket is cheap), so that they can be replaced with an DIP-on-IDC plug and cable for automated external setting. [Andrew: Socket is actual user request, I second it. Or if they can all be set from the external config IDC 26, then no need for this]

Users may want to have multiple configurations. Or even may want use the FPGA-PC as development system to improve itself, generating experimental configs and needing fallback to working ones. Both of these require config time selection of which config to use.

Simplest is to swap the config EEPROM. This doesn't require any more support than socketing it, so use an PC44 case for it. But this requires chip pulling, and so stresses chips and sockets. And for loading from the running FPGA-PC even pulling the EEPROM used for configuring while the system is running

Possibly include an second EEPROM socket for an 2nd EEPROM for programming it. With 2 sockets make only 2nd writable, or at least the 1st write protectable. Protects main design from any accidents or sabotage.

This should be combined with an switch to select which socket to config from. Also put in IDC connectors to allow such switching from PC case Turbo switch (or Keylock) and triggering reconfig from Reset switch (these are also separate from the "Turbo/Reset as input of the user design" IDC connectors, let user decide on which use). Possibly put in circuits for the FPGA to itself demand an reconfig. This is apparently possible, somehow have some IOs trigger some of the config signals.

Alternatively missuse the parallel port and an external adapter for EEPROM writing, for more access to its socket (but is this needed with EEPROMs?). Such an parallel port EEPROM writer is also sharable with development PC or an other FPGA-PC board (but only small insignificant cost saving). But this may require more complicated switching stuff to config from there, so that other printer/user port usage is not blocked by it. So preferrably not.

Internal Configuring from Flash by Microcontroller

Large config EEPROMs are expensive, even worse if multiple EEPROMs are needed for multiple configurations. Alternative is to offer config from an cheap standard flash chip. Such an chip needs to be addressed/controlled externally, so use an microcontroller (uC, such as 8051 or 68HC11 or AVR), or an CPLD. Preferably not an CPLD, as their speed is unneeded and no open source tools for programming them exist, nor data for reverse engineering.

Either use an 29Fxxx parallel flash chip, as used for PC BIOSes, which would require an 40pin uC to drive it. Or use an 8pin 45Fxxx serial flash, plus an 8pin uC. The later was mentioned on c.a.f to be <20% of the cost of the same size XC18xx EEPROM, for the 1Mbit case. [Andrew: I note that the Xess Virtex boards both use standard parallel flash (with an CPLD to config), while their older XC4000 used OTP PROMs. They avoid the expensive EEPROMs entirely. We should also]

Either use an minimal 4MBit (= 512kByte) flash chip. Or better put in an larger 16MBit (= 2MByte) flash chip, and offer space for 4 configurations with only one flash chip (and the uC).

Because in later 4 configs space, no multiple flash chips are needed, no sockets or switches. Use the uC address initialisation software to select which config to then use, no hardware for this either. Swapping flash or socketing the flash chip is not needed either. To control selection use 2 switches. Same as above with separate IDC connectors for the PC Turbo switch (or Keylock) and triggering reconfig from the Reset button. No need for FPGA driven self-reconfig either, simply ask the uC to trigger stop and reconfig. This also allows selecting any of the 4 configs by the FPGA, with the right signalling pins.

The FPGA can now also write the configs via the uC (just requires some signalling pins). Use the uC firmware to write protect or allow writing any of the 4 configs, without any hardware needed for this either.

In addition to configuring the FPGA, the uC should also be able to set M0-2, and also control the clock 0 programmable oscillator, and also be able to drive the development system drivable FPGA clock 3 would be nice. So best offer the uC access to the same circuits that the development systems printer port would use. So use tri-stateable pins for this, or better an parallel port driven multiplexer, which can override an badly crashed uC.

[Andrew: I prefer this usage to specialised EEPROMs, more configs, less hardwired, more flexibility, for less cost. Just needs uC software writing, once and then it is done. What could be better?]

Parallel to the interface uC to FPGA for configuring, have also an one from FPGA to uC for run-time downloading of initial or updated firmware to the uC internal flash. The FPGA will initially or after failled uC upgrade need to be configurated from an development systems parallel port. Alternative have the uC only reloadable from the IDC 26 config connector, but this is inferior.

Possibly, run all the Turbo/Reset/Keylock switches from only one set of IDC connectors to the uC, and have an uC controllable IDC/uC switch for the lines to the FPGA. So the FPGA can get these direct or filtered by the uC. Same Power/Turbo/HD LEDs from FPGA to uC and then switch FPGA/uC to the IDC connectors, for filtered output. This allows changing uC vs FPGA usage by reloading uC software, without opening case and replugging. [Andrew: I am not really sure this feature is worth the extra work]

Internal Configuring from Devices via the FPGA by Microcontroller

With above ability of an running config to write new configs, together with the ability to demand reconfig from that config, it is possible to make an minimal config that just fetches an second actual config from any FPGA accessible storage device, writes it, and then reconfigures with it. This allows unlimited configs, and fast switching to experimental ones.

This also allows using any FPGA accessible user IO device to offer the user an selection of configs, as in an boot manager.

As rewriting flash too often (every config!) breaks the chips, possibly put in an single config 4MBit (= 512kByte) SRAM chip, for such temporary configs. Such SRAMs are not available in serial, so need an 40pin uC, so definitely also use an parallel flash chip.

If 256kx18 chips were used for L2 cache or CMOS, then possibly use one of those here, if reducing part list size saves cost. Use this one definitely in 9 bit wide wiring to save uC pins, as only 8bit data are used anyway.

As there is battery backed CMOS SRAM on this board, possibly battery back this SRAM also. This suggests an alternative flash-less design, with only SRAM for 4 configs. But this requires 16MBit of SRAM, which will cost 4 chips, unless 16MBit (2048kx8 or 1024kx18) SRAM chips now exist.

Partial Reconfigurability

FPGAs can be reconfigured, even while in operation (short freeze while loading). This can also be an partial reconfig, which is even faster, and leaves the main part of the running design in a state where it can continue running from. This can be used like loading kernel modules in an OS. Particularily IO device circuits could be loaded/unloaded this way, leaving an FPGA-CPU running, including the OS on top of it. Parallel/user port being the obvious candidate for this. [Andrew: One of my users even had the idea of an variable section of the FPGA that gets swapped by the OS on every process reshedule, giving every process its own FPGA extension! I managed to convince him that performance is too low for that, and that loadable modules for one or a few (2^n-1) processes, and an OS driven switch for using them is better]

In Virtex family chips this is done by replacing vertical stripes of CLBs and the IOs at the top/bottom of them. This requires the FPGA to give the config circuit the new CLBs and then the config circuit to stop the FPGA, write the stripe, and restart. This requires an uC+SRAM design, config EEPROMs will not suffice for this.

Replacing the CLBs also replaces the IOs above and below them. So possibly arrange IOs so that replacing one of multiple reserved "module" stripe(s) of CLBs does not hit important IOs, such as memory, as this loses their IO FF contents. But such an replacable stripe gives trouble with routing going through it to the IOs on the other vertical side of the FPGA.

This can be solved by sending them around the VersaRing lines in the IOs. But even this requires "saving" the IOs and adding them to the new stripe of CLBs to be loaded. So this must also save the IO FFs. So it looks like partial reconfiguration is an non-issue, as far as IO layout goes. And designers can use any arrangement of replacable CLB sripes.

Loading IO circuits also requires routing from them to the IOs used. So the IOs can not be copied without changes. Same also happens with connecting the circuits to the rest of the design. Replacing just these routes, but not the rest may be a problem for the uC. Pre-routing dummy routes just to the first CLB of the stripe should solve this. Or use global lines from outside into the circuit.

[Andrew: This section and all following were designed in the 2.5 months (2002.12.08-2003.02.23) before I came up with the flash+uC and then its flash+uc+SRAM-via-FPGA boot manager variant.
Up until this time it was either: EEPROM without uC (no multi-design space for "reload and boot" logic), or uC with only devices and no flash (no place for an boot manager design to be loaded from, so needs direct device access).
Since coming up with the uc+SRAM combination, all the uC direct access to devices and user IO stuff is not needed any more. But I have left it here in the file, just for you to see what was long intended. Also perhaps there is something interesting in here.

If someone still wants an FEP style design, they could put an "small board" inside the PC case, with an normal 26pin cable from small boards printer port to this boards config connector. They would then need to use SCSI drives with both boards on an single SCSI cable for shared disks. This would require adding SCSI to the small board (by add-on card?)]

Internal Configuring from Devices by Microcontroller direct

Offer to configure the FPGA from some storage device (disk or flash). For all these forms of configuring the uC must access the device and write the configuration into the still dead FPGA. Doing this requires an uC, for this CPLDs are not powerful enough.

FPGA config is actually similar in concept to loading CPU microcode (which is what floppies were invented for), so offer floppy use for this. But also allow config from ATA/SCSI drives for faster performance and more comfort.

Share access to the same devices of these types that the FPGA has access to, so no need for separate devices and the FPGA can place configs there for the uC to then use. This suggests that the uC should actually have pins shadowing the FPGAs floppy and 1st ATA/SCSI pins one-on-one. This requires an large 64pin or even 84pin uC. This also requires the first SCSI to be on board and not on the CDEC.

Only one of FPGA and uC can use the devices at any one time (the other must tri-stat its shared pinse), with the exeption of SCSI where both can share an bus (as switched tri-state is in the bus definition), so long as they do not modify the same file system or use some other exclusion system. The FPGA is tri-stated after power up or reconfig pin, so the uC needs to switch to tri-state before letting the FPGA run.

Facultative, if there are enough pins available on the uC (unlikely), also add control pins to access any other FPGA PCI/ISA bus device. If SCSI is only PCI, then this is needed. In this case, if there is on-board full chip USB, also add access to it. Is good as USB is external (unlike ATA/SCSI), and so allows the use of an pluggable USB disk or even an "flash plug".

Even more facultative, in todays world of Ethernet OS boots, also possibly Ethernet FPGA configs, if on-board full chip Ethernet is included. Also allow access to that, if enough pins and code space in the uC to use it (assuming then an DHCP+TFTP-only single connection UDP/IP client is enough). But this is really over the top, while USB may still be useful.

For selection have standard config to come from fixed ATA/SCSI and the experimental configs from removable floppy or USB, with the ATA/SCSI as fallback. But this is bad for multiple standard configs. Or simply have 1 config = 1 floppy or USB, but this loses using ATA/SCSI HDs, which may already be available.

Front End Processor for Selecting of Configurations

Unlike FPGA config EEPROMs which are one config per chip, and so can be selected by swapping/switching chips, devices can contain multiple (or even many) configs in one, and in the case of HDs are un-swappable. So selecting between these is desirable. So need to provide an selection interface.

The simplest method would be an set of 8 DIP switches (again no fumbly jumpers, and also DIP socketed for control remotely), readable by the uC, for selecting the config device type/number, and partition/file number on it, and the fallback policy if an removable media device is empty. Like the external config connector, having these DIP switches externally accessible would cost ATX connector area and show FPGA-ness, so make them internal.

But internal switches make selecting difficult to access on every config, which is often the case while experimenting. Better than DIP switches would be to have the uC offer the user some sort of prompt via the standard user output ports/devices and then take input, auto-configuring on some timeout. This is like an PCs boot manager for selecting BSD or Linux kernels or other operating systems.

Unlike boot managers which run on the same processor and IO as the final OS and kernel will, the FPGA-PC has at this time only an unconfigured "dead" FPGA, not an runnable processor. [Andrew: here the above "boot manager config" hits, making the FPGA usable for fetching and of course also for selection]

This is an equivalent situation to an microcoded processor without microcode loaded. Historically this problem was solved with an separate hardwired front end processor (FEP) computer, like an PDP-11/40 (in the KL-10) or an 8080 microprocessor (in the KS-10). Use the microcontroller, or even perhaps an full microprocessor (uP, such as Z80/Z180) for the FEP, depending on the functionality desired.

To do this, the FEP needs its own access to user I/O. For this also share an subset of the FPGA-PCs input and output devices with it.

PC case switches: These can be used for an very simple user input without any external stuff (good for embedded systems). Turbo to go into "select mode" and Reset to "step through" config list. As these are FPGA and FEP inputs, just shadowing them is good enough. OTOH shielding the FPGA from "incomprehensible" input while the FEP is being addressed may be sensible. If the FEP hardware is bidirectional this also allows it to issue an user design Reset or Turbo input to the FPGA. But better an separate to-after-switch output to do that, allows FEP to "filter" and pass what it wants
PC case LEDs: These can be also used for an very simple user output without any external stuff. Allow the user to see what is happening. Turbo LED for "we are in select mode" and power LED blinking for "step number". As the FPGA may be driving the LEDs, shadowing pins would collide, so some form of FEP driven switch is needed to "take them over". Default to FPGA when the FEP is not running
1st/keyboard PS/2 connector: Allows user to enter numbers or letters to select an config. Use some Alt-Ctrl-Shift combination or Turbo to switch over to FEP. Also usually inputs, but sometimes output. Apply same collision avoidance as between keyboard and PC. Or better switch to shield FPGA from senseless input. If bidirectional FEP stuff, allow it to simulate keys, but that may require blocking standard key input, or an separate FEP-to-after-switch line
1st VGA, display via the FPGA (with only an simple fixed glass TTY config): This is fairly simple hardware wise, but requires flushing the FPGA for the FEP to (re-)output its display, which loses the FEPs independant operation
1st VGA, primive VGA display from the FEP: Will require an pixel shift register. Just 1bit onto all 3 RGB channels, low resolution, just text mode without attributes. For sharing the VGA connector, either additional FEP driven resistors in the DAC (but only possible if using resistor DAC) and switch the H/V, or far better put in an 5-line analog video switch for RGB+H/V (like done in KVM switch boxes), switched/grabbed by the FEP, non running FEP defaults to FPGA. Allows FEP output without clearing the FPGA, but is a quite a bit of FEP hardware [Andrew: Somewhere around here the FEP starts to become too large, see later for better ideas]
1st VGA, good VGA display from the FEP: multi channel/colour pixel register. Then add 2nd/mouse PS/2 (input, shadow or switch) and PC speaker (analog output, so additive ampifier, no switch) to the FEP. Then a full terminal in the FEP is possible. This makes it possible to run the console terminal entirely in the FEP, having the FPGA only access user IO via the FEP (as the later PDP-10s did). Even better put in an full 2 channel KVM switch for VGA and keyboard and mouse. But this loses the more powerful FPGA VGA output. And FEP hardware complexity skyrockets
1st RS232 port: So the user can interact directly with the FEP from an external console terminal. Here also shadow is problem on outputs, so outputs need FEP switching. Input can be shadowed so both get it, or FEP-switched to protect FPGA. Lot less work than VGA, but not everyone has an terminal laying around for the FEP, or wants to use one
FPGA to/from FEP Interface: To allow the 2 subsystems to communicate with each other. Usable to have the FPGA "order" the FEP to reconfigure it with specified configuration. Perhaps reuse some of the FEP to FPGA and FPGA to FEP config interface pins for this?

With the FEP on 1st PS/2+VGA and the main system using 2ndPS/2+2ndVGA (if it exists), one can even have dual independant consoles, mouseless or with RS232 mice. But this requires 2 keyboards and monitors (and mice), which many users will not want to provide, same space problem as terminal.

For merging FEP and main system output onto the same VGA display (FEP as status line at the bottom, or even as window) have FEP VGA output also go to an FPGA input. This allows an digitiser (like BT848/878 type devices) be implemented, writing into FPGA video memory. This gives both outputs at same time, at full main system VGA quality, on one monitor, but this adds more FPGA design work.

This also allows the FEP to "flip" 2/3/4 screen data sets at 60HZ to the digitizer for having 2/3/4 FEP windows, display updated at 30/20/15Hz. With "flipping" the FEP becomes even more complex.

For getting the larger amount of software needed for all this into the FEP, instead of using one large flash part for firmware, have only an minimal boot ROM and RAM for software in the FEP. Have it boot its software from the same devices that it can later take FPGA configs from. This can be an full OS (such as CP/M if a Z80/Z180 is used). Also use the OSes file system for storing the config files. The FEP has become an full (micro-)computer, like the historical FEP computers actually were.

Pluggable Front End Processor

The above FEP has become an quite complex fullgrown 8bit microcomputer. Its design/layout size/complexity and parts cost time and money. In particular with the desirable full power VGA and RAM OS and in-window additions, it is outgrowing the rest of the board in design/layout time. And it is also an hard-wired, non user alterable, thing. Design bugs from its complexity are also hardwired and expensive to fix.

Possibly make the FEP an facultative, pluggable component. This gives users the choice of simple on-board config by EEPROM (or still uC and devices), or using pluggable FEP for complex/expensive user selectable configuring. [Andrew: How many users do you think would want an FEP? And those that don't want an FEP, how many do you think would still want to use uC+devices vs just go for EEPROMs?]

Easiest variant is to just provide all signals any FEP would want, run to one (or an set of) IDC connectors and leave it at that. This though requires non-EEPROM/device users to make an special FEP computer design, hardwired or FPGA based.

Better offer something to be plugged in as FEP. But that brings back design time cost, even it parts costs are now optional for the user.

FPGA based Front End Processor

Alternative to an uP based FEP and an growing amount of hardwired logic around it, use an small FPGA for the FEP. This reduces board complexity, layout time and unmodifiable inflexible circuits. This FPGA would be fixed configured out of EEPROM. As alternatives to EEPROM the FEP can only be configured from an development system or the running main FPGA system. [Andrew: This is analog to your "small FPGA for floppy config" module for the QST0201 idea]

Such an FPGA FEP allows any user-selectable FPGA-CPU in it. For Sparcs one may use 8051 or Z80 or even PDQ8. For PDP-10s one could use an PDP-11/40 as in the real KL-10 ones, or even an minimal PDP-10 like the KA-10, as used for the Foonleys. So the FEP FPGA should be no smaller then XC2S150, which can still get by with an small XC18V01 1MBit EEPROM. Sensibly use something derived from the QST0201 design for this.

FPGA configuration includes either FEP firmware in BRAMs (but that is size limited) or give the FEP an bit of external RAM and use its BRAMs only as boot ROM. This would result in FEPconfig(EEPROM) then FEPboot(disk-direct) then main-config(disk-via-FEP) then main-boot(disk-direct) start up sequence. This is fairly involved and quite a bit of (too much?) programming work, but ultimately (and unneccessary?) flexible with little hardware layouting and fixing (definitely good). [Andrew: One of my users says he regards it as too much work for each user to first get/install the FEP FPGA config, then get/install an FEP OS and only then actually be able to get/install the main FPGA design. I regard the "main FPGA from EEPROM" variant as minimal "start immediately" operation, and the FEP as only facultative. But perhaps deliver the FEPs with an simple FEP config and non disk based OS pre-installed in their EEPROMs? This lets the users treat the FPGA FEP as if it were an hardwired uC/uP FEP, or modify it if they want to]

An FPGA FEP allows any complexity FEP IO options, so long as the needed devices are wired in. With QFP208 (and so 140 user IO pins) this allows floppy and PCI/ISA and ATA/SCSI and USB for config file fetching and LEDs&switches and 2*PS/2+VGA+PCspeaker and RS232 for config file selection.

This would also allow serious VGA level output in the FEP, for an good console terminal in the FEP. Also allows the FEP, when not displaying directly, to send data custom formatted to an simpler "digitizer" in the main FPGA.

Pluggable FPGA for the Front End Processor

Of course this FPGA FEP still adds the 2nd FPGAs cost to the board. So better simply provide an set of IDC connectors and then an 2nd FPGA FEP board to plug in.

One possibiliy is inserting the already existing QST0201 board for FEP usage. Then simply wire the various memory, IO devices and storage devices to the QST0201, fitting its 3*40+50 IDC connector pins. This separates out the FPGA and all its development/EEPROM config stuff, and it reduces layout work (but how much?) by reusing QST0201.

QST0201 still needs the large board to provide it with clocks (share the main FPGA ones?), memory (suggested 1 265kx18 as that part is already used for cache) and all the FEP drive and user IO analog stuff (share some?). So this only provides non-optimal (but quite good) facultative part cost.

But the QST0201 has an DB25 for configuring, not an IDC for data from the main FPGA, and needs an power brick plug, not power from some IDC. [Andrew: These are the only weaknesses I have found so far in QST0201. It may be sensible to change them in a future revision of QST0201, to make it an true pluggable component, delivered with DB25 to IDC programming cable and an powerbrick to IDC power cable adapter]

Pluggable FPGA based Front End Processor

Alternative to using QST0201 is to use an separate full XC2S150 based FPGA-PC board, in its pluggable industrial/embedded variant. On this large board provide counterparts for all IDCs to connect up to. It deliberately has the right feature set for this (although design predates this usage, but it later was influenced by this possible use).

This gives optimal facultative part cost. And side effect of doing this variant is that the FEP IO stuff design and layout time is shared with getting an separate small FPGA-PC board (and so an small+large board series), so double use of the design and layout time.

XC2S150 FPGA and XC18V01 EEPROM socket for its configuration. Allows FPGA, M0-2, clock 0 frequency, clock 3 signal to all be taken from LPT connector. This can be driven by main FPGA, just requires an special interface (additional internal printer port!) on the main system to drive it
Enough SRAM for booting an smallish disk based OS into it
LPT printer/user port that can be used to drive the main FPGAs config interface and clock 3 and clock 0 selection and M0-2, like an development system would
Floppy and ATA connectors for OS booting and config fetching from FEP-local devices
Missuse the PC/104 bus connector to shadow the large boards devices (floppy, ATA, SCSI, USB, even ISA or PCI). Better make the main boards buffers disable-able by the FEP. This can also be miss-used for the various FEP controllable output/buffer switches, or even for driving the after-switch inputs, so that they get FEP-filtered user input
PC case LEDs & switches for simple control. Just needs still the main/FEP FPGA output switches
2*PS/2 and VGA (with good resolution) and PC speaker, and RS232 for both variants of prompting, and for being console terminal. Also just output switches needed. VGA out is already analog levels, so the "digitiser" needs some ADCs (or only use VGA 0/15 levels)

As the small board has all male IDCs, so it needs female IDCs on the main board. So for small board male IDC for printer port, the main board config connector needs to be female IDC (do there exist male on-cable IDCs for config from developement system cable?). With small board config the same female IDC, the main FPGA must use male IDC for the internal printer port to config the small board. Without the FEP board inserted this male 26pin can also be used with an standard slot mounted LPT connector as further printer/user port, so also give it an set of power switching stuff.

Home | Projects | FPGA-PC clone | Board Spec XC2S600E

This page is by Neil Franklin, last modification 2003.06.07