At present no detailed pinouts or pin counts have been investigated. Dito no checks if all the desired functions will fit the available pins. Or any decisions about some functions sharing pins.
Form Factor, Case and Power
Expansion and Bus
User IO Devices
Configuring the FPGA
Despite this I was not decided on making an custom hardware board, because this would be even more work/time, in designing, manufacturing and processing orders for the board. I was hoping to find someone who would come up and do this for me and the other users.
But in case I did decide to make an board, I had collected various ideas for this in the Hardware file. It turned out that there was actually not much PDP-10 clone specific stuff in that list, so I had alredy decided I wanted an general-purpose FPGA-PC board (named so after FPGA-CPUs), usable by other cloners, to save them wasting time repeating nearly identical design work.
It must definitely run ESAs Leon Sparc clone. This is Andrews reason for designing this board, and so as his part of the spec. But it can also run anything else which has no feature set conflicts with this. I see no collision for my PDP-10 or any other cloner.
The aim of this board is to be an "PC without an PC". That is for it to use, apart from the special board itself, only readily available and non/low-troublesome PC components, such as cases and power supplies, memory modules, disk drives, interfaces and peripherals (keyboard, mouse, monitor, etc), but to leave out the troublesome or disliked PC stuff, such as the x86 CPU and PC architecture chips (north-/southbridge), replacing these with the FPGA, which is used to implement the users desired CPU and system architecture. This makes it easy and cheap for users to put together an FPGA-PC system without soldering, possibly reusing existing components.
This shall be an general-purpose large systems (= 32/36/64bit, as opposed to small systems (= 8/12/16/18/20/24bit) computer cloning board. Possible uses for this board are:
This is NOT an project by Andrew Grillet, unless he adopts it when I tell him about it. In the mean time I have asked him and it seems unlikely (not enough sales volume, better concentrate on one board).
Top of range, according to the Xilinx data sheet, is the XC2S600E part (with 2*48x2*72 LUTs and 6x12 BRAMs) largest in an FG676 case (gives 514 user IOs) using 3'961'632 (=4M PROM) config bits. This is in an (fine) ball grid case, but Andrews board manufacturer can process BGAs, which is better than anything I could design, layout or get built.
Problem with this chip is that the free Webpack software does not support this size. So it would require the ISE software, making using the board very expensive, too much for many users. So possibly fall back to the XC2S300E part (with 2*32x2*48 LUTs (less than 1/2) and 2x8 BRAMs (less than 1/4)) in an FG456 case (gives 329 user IOs (is 2/3)) using 1'875'648 (=2M PROM) config bits. XCV300E would be supported and doubles BRAMs, but only even 312 user IOs and far more expensive.
Using such an high pinout device suggests the requirement of an many-layer board, so other high pinout parts can also be designed in, but should be restricted to where neccessary (because of layout time, and possible higher board cost).
Preferably no ventilator as that makes noise. And even for 200MHz with todays FPGA transistor sizes and many static configuration memory transistors there should not be enough heat to require one (1993s 486DX/2-66s still ran at 66MHz without one).
This board can be used for serious workstations and servers, so memory words should be wide enough for at least parity, and better allow ECC.
Parity in 32/64bit systems requirs an 9th bit for every 8bits (gives same 36bits needed for 36bit designs. Ideal would be byte level parity with 4*9bit wide writes. But word-level writing is acceptable.
Parity in 36bit systems is most likely one bit for the entire word (or 1/2 or 1/4 word), giving 37/38/40bit words. Definitely word-level writing is acceptable.
ECC in 32bit or 36bit systems requires 8 bits more, depending on data bus width and algorithm. So 32bit needs 40bit and 36bit would require 44bit.
This is also ideal for cache killing large data set processing (servers). This can also be used to partially offset the slow clock frequency of FPGA-CPUs, at least in some applications.
And the load caused by on-board VGA will slow down the processor less on an wide data bus (shorter or less often stall time, and processor needs memory less often).
There are multiple variants:
Memory bandwidth with PC133 SDRAMs is 133MHz*64/8*2 = 2128MByte/s (when ignoring dead time sending addresses. A 133MHz FPGA-CPU should not require more than 133MHz*32/8 = 532MByte/s (= 1/4) of instructions, plus data accesses (can be 1/2 of bus, so allows vector processors), plus any video and disk DMA. So these are fast enough. If XC2S300E and only dual width bus, this may be more limiting.
Signals are for 1 DIMM slot wide (dual data bus) DQ0-63 (64) + CB0-15 (16) + /WE (1) + /CAS (1) + DQMB0-7 (8) + /S0-3 (4) + /RAS (1) + A0-12 (13) + BA0-1 (2) + CK0-3 (4) + CKE0-1 (2) + /MWAIT (1) + /MIRQ (1) + SDA (1) + SCL (1) + SA0-2 (3) = 123 pins.
For 2 DIMM slots wide (quad data bus) duplicate DQ and CB and most likely S and CK also, adding 88 pins giving 211 pins (41% of 510, but 65% of 325). So that wiring is simpler, wire D pins alternatively to the 2 corresponding slots (which will most likely be arranged behind each other, to fit on the board).
As 123-88 is only 35 pins it may be sensible to spend 35 more pins towards 2 sets of independant 80bit wide memories (and less address bus and control signal fanout). (#lookup: what dual SDRAM channel server northbridge chips do, also what Opteron does for its dual channels)
Some users may want to also use this memory half separation for 2 independant memories, one for main memory and one for video memory. This 2 halves is slower because each half is monopolised and unused capacity wasted, but it is simpler than the more complicated interleaved CPU/DMA accesses to one double wide memory, where the full bandwidth can be optimally shared
Signals are assumed to be similar to SDRAM, just double data bus.
Memory bandwidth is identical with above, unless these may only be 100HMz or even 66MHz, in which case they would be slower.
But these are less flexible for parts (need quad word 144bit DIMMs, no cheap dual word 72bit, nor any non-parity/ECC 64bit or 128bit). Also have max 144bit (no 160bit), so ECC/parity is lost for 36bit systems. And also less easy part to get (not used for PCs), so these may vanish faster from the market.
So these are only inferior to 2*64/72/80bit, it is surprising they were ever made. Don't use these
Signals are assumed to be similar to SDRAM, even though DIMM is 16 pins more. Exeption is missing 8 ECC pins. (#lookup: pinout)
Memory bandwidth with PC2100 is (2*133)MHz*64/8*2 = 4256MByte/s, with PC2600 is (2*166)MHz*64/8*2 = 5328MByte/s, with PC3200 now up at (2*200)MHz*64/8*2 = 6400MByte/s. This is far more than we need for any FPGA-CPU. But it may be useful for massive vector stuff, or for massive DMA, in particular video. Or for simply wasting with inefficient memory controller. For XC2S300E this addition will be quite attractive.
DDR comes at cost of slightly more complicated signalling, and only being able to load pairs of words. But cache reloads are seldom single words, otherwise waste one word :-).
DDR have the far more serious problem of only being max 72bit wide, so no (2*)80bit, so no ECC (and not even parity) on 36bit systems. OTOH the 80bit SDR SDRAMs are also very seldom (who uses them?). And one can take an 128bit memory word and use that as 3*36 = 108+ECC/parity. This costs 1/4 throughput, but any ECC or parity test costs time anyway, and the DDR is giving double troughput, so there remains 2*3/4=1.5 = +50%. Alternative put in 4 words of 4*36 and then an 5th word with all ECC data (byte level over 4 same position bytes?), then waste 3 words (todays memory chips are large), as this may be easier to address. [Andrew: This is the nearest case to an conflict Leon vs PDP-10 that I have found! For 32bit DDR is just more speed at little more work, for 36bit it makes ECC require +33% or +100% more memory instead of +11.1%, and additional work, and loses half of the gained speed. So not really bad either]
Possibly put in (banks*width) 2*2 = 4 slots. 3*2 = 6 is overkill, even modern medium level servers do not do that any more. For XC2S300E with one bank width, put in 2 or 3, not more. So there is no need for using the slower registered SDRAMs. Above pin counts are for 1 bank of DIMM slots. For 2 banks duplicate some of the control signals and most likely also S and CLK (which so grow with width and depth one set for each DIMM slot) (#lookup: which signals need duplicating).
The XC2S600E has 72 BRAMs (= 36kByte), usable for internal (L1) cache, data and tags, so that will allow max 32kByte size (so long no other use of BRAMs is intended!). This may be too small for an 100+MHz system, so the board may want to offer an external (L2) cache. The XC2S300E has got only 16 BRAMs (= 8kByte), so that tops out far lower.
OTOH PCs introduced L2 caches in the days of 486DX33es loading over an 33MHz*32/8 = 133MByte/s bus from FP DRAMs and with only 8kByte L1. In the days of 100-166MHz PCs these were 586-1xxes with 66MHz*64/8 = 528MByte/s bus from EDO DRAMs and 16kByte L1. Here we have 100-200MHz and memory up at 2128..6400MByte/s bus from SDRAM and can have up to 32k of L1.
And with an 4*4word cache reload granularity, that gives an setup wait of perhaps 4 to 8 clocks every 16 clocks, so only 1/5 to 1/3 slowdown when L1 fails and we need to load direct from SDRAM.
So perhaps save the cost (layout time, parts, board space and design complexity) of L2 cache. [Andrew: I don't believe that L2 cache is actually worth it. Better implement an good L1 cache. OTOH on an XC2S300E there is a lot less L1 space, and L2 on that is only half the width, so it may be worth it there]
If L2 cache does go in, there are 2 variants:
This gives 144bit, which allows byte granularity parity on 32/64bit systems (the reason these chips were made 18bit), but 36bit systems will have to go without cache parity or take bits from the tag bits (reducing cachable address space), which is not too bad if one uses word granularity parity (costs only 1bit, only halves address space). ECC is senseless in cache, is too slow, and reload from main memory in case of parity error.
L2 cache will have to share the 128/144/160bit data pins (SDRAM is only used when the cache has failled to return data, and the cache loads what SDRAM delivers). But so DMA from in-FPGA video or disk controllers will kill same-pin L2 accesses also.
Many PC processors use separate frontside (main SDRAM) and backside (L2 cache) buses. There are not enough pins for 144/160+144bit (see the 320bit discussion above), so this would go down to 72/80+80bit. That gives slower L2 to L1 reloads, and also slower SDRAM to L2. So separate frontside and backside is only good for PC processors where only the 64+8bit SDRAM to L2 load over few external pins of small PG370 chip cases and the 128bit L2 to L1 is inside the chip. With FG676 and no chip internal L2 just one 160bit bus is faster (and simpler).
Tag bits for an simple set associative cache need to be address bits of main memory minus address bits of cache, so 30|32|36 - 17|18 = 12..19bits. Plus a few for "row valid" bits. So use an 9th 256kx18 chip, which gives 18 tag/control bits. For tag writing while still reading ECC on memory to L2 loading provide extra tag data pins, do not share ECC pins (of 160bit memory if used). Also separate tag address (18 or 17) and control (few) pins. (#lookup: 256kx18 pinout, in particular control signals) Tag bits for n-way set associative cache is n times larger, so too expensive for such an large cache.
This L2 cache is also enough memory space for some smaller-memory designs to be possible with no DIMMs at all. It is simpler to use for such designs (but how much?). If they want 160bit width for 144bit+ECC, missuse the now unused tag bits for this (gives them 162bit).
This L2 cache can also be used as microcode store for an horizontally microcoded processor, if the BRAMs are not enough for this (but even the larger model B KL-10 had only 2kx96bit microcode, which is 48 BRAMs, so that would fit into 3/4 of the BRAMs). This can also use the full 162 cache+tag bits. On XC2S300E again this is more likely to be usefull.
An on board voltage converter 5V to 1.8V (or 12V to 1.8V) is needed for VCCINT (FPGA core). Preferrably from 5V, leaving the 12V for drives. Do not use 3.3V to 1.8V, as 5V and 12V are on most newer ATX power supplies way more plentifull (about half of power each) and more stable (primary switching regulator input, 3.3V is often only auxillary power, just an simple linear regulator from 5V).
VCCO (IO) and memory 3.3V can be taken directly from the ATX power supply, it is good enough for that.
So this design should require only 12/5/3.3V parts. 1.8V only for VCCINT, else parts may interfere with the FPGA core operations, and they would require 1.8V availability away from the FPGA also.
The board should be capable of controlling the ATX power supply.
Use LEDs and switches. To fit the 17 or 6 IO pin count, this will have to be an scanned matrix arrangement. In the FPGA grab all data lines and shift them out, 1 bit at a time. And provide parallel to these data bits their column number to an external TTL or CPLD based column decoder. Or drop the number and use an external column counter and reset/clock signals to drive the decoders. Or use an row of FFs with an bit travelling along them. At same time read in the switches into an shift register. And then transfer them to action circuits when all in.
PCI allows both complete commercial cards and prototyping cards (both for simple random circuits or FPGA based ones) for user designed additional devices.
PCI is ideal for cloning computers that have PCI busses, as it allows the actual PCI cards that these systems use. PCI is also decent for any 32bit computer, this just requires adding PCI to its design, and gives a lot of cards in return for including this, just write drivers for them.
Wide (64bit) PCI is most likely not important enough (only used in high performance servers) to justify its inclusion in this board. OTOH it may be an good alternative to the CDEC below.
Fast (66MHz) PCI is not important enough (only used in high performance servers) to justify its inclusion in this board. Also it requires 3.3V which may be incompatible with the PCI bus driver chips (would need switchable dual voltage operatable parts).
Signal wise PCI has 32 multiplexed D and A signals, plus quite a few control signals. Some controls seem to be individual-slot. (#lookup: PCI signals, what needs to be individual-slot and what can be all-slots) Make at least 2 of each slots 4 INT pins individual-slot, for decent interrupt designs, real plug and play.
For 36bit PCI cards can be used, simply wasting 4 bits of the IO data bus, like the original KS-10 which used 16bit Unibus PDP-11 cards. Put an 8*36 to 9*32 converter in to the DMA circuits in the FPGA.
AGP is an single slot PCI variant, which is only usable for video cards. There are no prototyping boards available to missuse it for any self built stuff. [Andrew: Or do you know of such boards?] As it is such a large amount of single-purpose pins (unless shared with an ISA connector based CDEC, see below), better provide on-board VGA.
Signal wise these can share pins with the PCI bus anyway, as an IO instruction will only address one of these buses at any one time. Dual bus requires 2 sets of bus drivers, so that the ISA stuff does not slow down PCI with its capacitive load. Appart from that it is then just connector and wiring cost.
Signal wise ISA has 16 D and 24 A, and various controls. All ISA signals are all-slots, none individual-slot. So no per-slot separate wiring for decent interrupt designs. (#lookup: ISA signals)
Put ISA D0-15 on PCI AD0-15 to save FPGA internal multiplexers on data lines. One possibility is to (miss-)use PCI AD16-31 for ISA A0-15 and ev some PCI controls for ISA A16-23 (if included). But better latch the ISA A0-23 lines from PCI AD0-23 onto external registers in the A bus drivers (which is what what happens on the PCI cards anyway), as this saves FPGA internal different multiplexing of PCI and ISA address signals (2 times address in different places) and is so easier to use.
It would actually be nice if the board offered all once often used PC/AT "legacy" interfaces, such as ISA, PS/2, RS232, LPT, floppy, as this makes the board (with an x86/PC FPGA configuration) usable as an legacy PC, when the industry kills off these interfaces in its zeal to cut dumb user support costs at the cost of power/laboratory users. This creates an additional, possibly large, market for the board.
32bit ISA variants such as EISA and Vesa VLB are not important enough to warrant supporting them. Anything requiring that power will have to go over an PCI slot (and exist for PCI).
PC/104 is just an ISA variant that uses exactly the same signalling/pins, and can even share the ISA bus drivers, is just an different (IDC) connector. So adding this is low cost, so it may be sensible to include it, to making embedded systems usage easier. Definitely no PC/104-Plus, which is the upgraded PCI over IDC version, as next to no cards need it (less than 10% can even use it).
So one can use any own/custom bus signalling on the PCI or ISA signal pins, so long the bus buffers don't get in the way. This should be good enough for any bus signalling design. Then simply use commercial PCI or ISA prototyping cards (which don't fix data pins either) and implement whatever device/interface one wants, possibly using an 2nd FPGA on the card. [Andrew: One of my users even suggests that this may create an aftermarket for add-in boards]
PCI has 6 "reserved" lines, of which 4 can be used for 32bit to 36bit extension. This use loses existing card compatibility, but allows an true 36bit "PCI-36" bus. So this board should pin out all 6 of these (4 all-slot as addresses, 2 individual-slot for flexibility, or all individual-slot if enough pins).
But above kills off all the PCI and ISA slots, which is quite an high price to pay for one simpler card. And the "dangling slots" may disturb high speed signals. Also any PCI or ISA bus drivers will interfere.
The board should have all IO that an typical PC-experienced user would want to use. It should offer all stuff that is built into an typical ATX board, unless there are good reasons to override this. Typical is assumed to be:
PC case LEDs (3), switches (3) and speaker
Ether Mouse Printer Game USB USB Key RS232 RS232 Sound
IO with unsolderable/replacable components in between such as VGA (DACs), RS232 (drivers), Ethernet (chip), USB (chip), sound (DACs) can be without, if it adds too much work/cost. But preferred with (if no signal trouble, as may be at VGA speed). [Andrew: This is an user request, I second it] Also protection, so that they are hot pluggable without killing stuff. [Andrew: Annother user request]
Very low cost, so no reason for putting in only 1. For these use one of the standard "vertically stacked" violet+green connector pairs.
For the actual bits to signal DACs use one of (#lookup: available DAC chips):
An alternative is to only provide an small video section (up to 4 or 5bit and <100MHZ) and use an premade video card (or the CDEC) for anything more. Problem is that PCI video card supply is not safe in todays AGP dominated video world, and AGP video cards require an dedicated AGP slot which is only good for them (unless paralleled with an ISA based CDEC). And losing CDEC for just getting VGA is not good either.
VGA is not in the list of "typical" ATX stuff above. So this will require dropping other connectors. Game is the most likely, sound also possible. Preferrably not one of the RS232 ports.
This costs an second full set of 3*(2..12)+2 pins and the stuff behind them. Because of the high transfer rate they are not sharable. RAMDAC programming pins, if needed, are sharable at least. Memory bandwidth in now also doubled, to max 48.6% of PC133 SDR or 34.3% of PC2100 DDR or 16.2% of PC3200 DDR. And there needs to be ATX space for 2 VGA connectors.
Alternative would be to use the CDEC if users want 2nd VGA. If making such an expansion board (not in base board price any more), it should provide 2 (2nd/3rd) VGA connectors on it. This means the CDEC connector will need at least 2 times 3*(2..12)+2 pins (or more if RAMDACs, but these 2 RAMDACs can share most programming pins) on it, which is the case if it is an "2nd PCI bus" or an "ISA connector CDEC parallelling AGP". [Andrew: This looks like the better version. Single VGA as standard, 2nd/3rd as extension for those users that want them]
Signals are simply 2 data lines and GND pin. Strongly preferred not just 3-wire, but with full pinout, with control lines (= 3 outs, 5 ins, so this only uses 8 pins), so that modems and RS232 mice can work. Drive data and control lines by 1488+1489 or MAX232/MAX233 or similar. Using an commercial U(S)ART chip is not worth its cost, and fixes register interface making historical cloning difficult. For this use the standard turquoise/cyan female DE9 connectors (DB25 uses too much connector space, converters exist).
Is an 2nd RS232 useful? Today with PS/2 mice, and switching to ADSL/CATV Internet the RS232 interfaces are less and less being used. OTOH, even newest commercial motherboards still have 2 RS232 on them (unless they have on-board VGA displacing one) and RS232 uses only an small connector and 8 IO pins. And the historic computing people want RS232 a lot. So put 2nd in if space and pins allow. [Andrew: Actual user request that 2 go in. I second it. I would prefer the on-board VGA to displace the standard ATX game or sound connectors, so that 2 RS232 can stay]
This is also (miss-)usable as an general user port (see parallel port disk drives). This is actually an very universal port, because no special circuits (not even buffers) behind it, just 17 pure (LV)TTL signals direct from the FPGA. It is sometimes also called the geek port for this.
For an general user port it lacks power supply pins, so possibly make some of the pins switchable (FPGA-driven relay or power transistors or just DIP switches?) to become power pins. I suggest 1*12V + 2*5V + 2*3.3V. [Andrew: 2 users have commented on this feature being a great idea, both users that definitely want to buy the board, so it needs to go in]
Using 5 of the 8 GND pins saves losing IO pins, but risks shorting by plugging in a standard LPT cable, if it combines all 8 GNDs (a PC power supply should just switch off). Using some of the IO pins can not short and so not crash the FPGA-PC via power faillure, but loses them as IOs. [Andrew: I prefer using GND pins, as IOs are too valuable, that is also the reason for 1+2+2 power pins, so that 3 GND pins remain usable]
[Andrew: Such an user port may also be an good alternative for your PDQ8 and Minibus ideas. It has 17 data lines (one less than 18), without wasting another 17 pins on duplicates (more for the PDQ8s local IO). Comes on an DB25 male plug, ideal for an long 25pin RS232 cable to an DB25 female on the (first) target peripheral, with there an 2nd DB25 male for further targets (like external SCSI cabling, but smaller cables/plugs). And then just an DB25 female plug with terminators in it (if the bus needs terminating) so no SCSI like trouble. And it can even provide power for small peripherals, so saving power wiring mess]
Signals, if it needs separate data/address pins connect A to the ISA A bus, after its bus driver chip (and so decoded address), if common A/D connect it direct to the common PCI/ISA FPGA pins, data anyway connect there, separate just chip select and alarm interrupt pins. [Andrew: There exist quite a few I2C (only 2 pins) based clock devices. But I2C is patented, which would require organising and paying per-copy license costs, either by any person implementing the FPGA-side I2C driver in their design (impossible for open source, as no per-copy money to pay it with) or by you as some form of "cover all uses" license. As no other I2C using chips (temperature or voltage sensors) are being used, this makes I2C not worth its costs, unless they are very low]
For simplicity power the RTC from an replacable Lithium battery, no rechargables and charging circuits.
Such an large battery backed CMOS SRAM can also be used to pretend that it is an ROM or flash ROM, for designs that want to see such an chip for firmware, be that BIOSen (as in PCs) or operating systems in ROM (as in older Macs or Amigas or Atari ST) or comfortable boot monitors (as in newer Macs, or in Suns and Vaxen).
So put in an larger battery backed CMOS SRAM.
For simplicity power the CMOS from the same source as the RTC.
Standard PC usage today seems to be line-out DAC (light green), line-in ADC (light blue) and mic-in (pink) jacks. So in addition to DAC+ADC perhaps also put in an mic amplifier and analog switch before the ADC.
This is quite a specialised interface, hardware only usable for networking, And not every user will use this, but many will do: workstation users, X terminals, stationary MP3 players. [Andrew: Do you intend your Leons to be networked? I want it, for workstation and server usage. Above sound user wants it to fetch his music. An other user wants it to connect real historic tape drives with small Ethernet/microcontroller/controller combos in them!]
Definitely 10Mbit/s, but also 100Mbit/s as the chips for this are not much more expensive. Gigabit Ethernet is too exotic for this board, if it is wanted for an server use an PCI card. The old 10base2 Ethernet, or even worse AUI/10base5, is today seldom to bother, if required, use PCI or ISA card or external converters.
2 or 3 (or even more, slot hole fits about 6) Ethernet interfaces would allow making routers or bridges or even switches. This would make an noiseless (runs 24h!) router. But this is very specialised, just like 2nd/3rd VGA connectors. So put in only 1 interface and do others as CDEC or PCI. [Andrew: One user is thinking along these lines (router)]
But like Ethernet this is an specialised interface. Requires an specialised chip, and connector can only be used for USB, not anything else. But also some/many users will want this. [Andrew: I presently do not use any USB. One of my users commented that I have too many devices, and that he regards sound, Ethernet and USB as candidates for removing. An other users wants USB (and suggests even dropping PS/2 and just using USB). I think that PS/2 and USB should go in, PS/2 being easier and cheap, USB doing lots]
USB 2.0 seems to have no use outside of fast flash/disk interfaces. OTOH USB 2.0 is most likely only insignificantly more expensive, so put this in, unless it is too much more expensive.
Put in 2 ports, as many uses for them. Together with Ethernet use one of the standard "Ethernet above 2 USB" connector sets. If the USB chip offers 4 ports put the other 2 USBs on internal connector(s), for wiring them to the case front connectors increasingly common these days. (#lookup: what internal connectors are standard for this)
But better alternative is to use the limited ATX connector space to provide the far more useful and flexible combination of on-board VGA and 2 RS232 and still having 2 printer/user ports. Use the "above sound" space for an 2nd "on top" printer/user port connector. [Andrew: What is the exact target audience for your Leons? Do they game? 2 printer/user ports and full 2 RS232 and an-board VGA are all way more important than joysticks or sound for me, and my users]
Digital (DB9, Atari/C64) Joystick connector(s): Users that want digital joysticks can missuse the printer port (without module), as these are only 5 switches and open/closed input pins. This is the normal way on an PC anyway (see the Linux joystick driver "multisystem" joystick driver for the wiring).
S/P-DIF digital sound out/in connectors: Normal PCs don't have these either. Also I do not know what circuits/signals these require, as I have never seen one. We do not have ATX connector space for them. And there exist PCI cards, which can be used. And this also can be done as an printer/user port plug-on module. [Andrew: Are serious sound processors to be expected to be among your users? Mine not]
MIDI audio control connectors: Would be part of the DB15 game/joystick/MIDI connector on an PC. Signals are a few 3-wire RS232es as far as I know (never done MIDI). PC usage actually requires an external "break out" cable (as the game DB15 is shared with the joystick) as far as I know. This can also be done by missusing an user port connector.
Oscilloscope display output: These were used in computers before 1970s. Have a program cyclically plotting the picture, no video memory (then expensive) to be scanned out. Signals X/Y coordinate and Z or ZR/ZG/ZB intensity. Simplest is to missuse existing VGA connectors RG as 8bit X/Y and B an 8bit Z. But this gives only 256x256 positions and B/W 256 gray levels. If 3*10..12bit VGA the positions go up, but colours stay B/W only. For full 256^3 colours use VGA RGB for ZR/ZG/ZB and then add 2 separate 10..12bit X/Y via the CDEC or an printer/user port module.
Composite or SVHS video output: This would be for standard TV/video monitor output. Users that want any of these will have to use an PCI card, with also the needed analog stuff and video modulator on it. Same also for any video input digitiser and tuner. Or this also can be done as an user port plug-on module.
IEEE1384 (Firewire): Today only of importance for digital video in/out. Possibly also for fast external disks, like USB2. If someone wants this specialised feature then use the PCI bus and put a card in there.
IEEE488/IEC625 connector or other industrial stuff: Used too seldom for standard users. Use an PCI (or is ISA needed for this old stuff?) card if one wants any of these, or use the CDEC to directly drive such an system. Alternatively use an user port plug-on module (are 8 shielded data and 8 control signals IIRC).
Modem or ISDN or other comms equipment: Very specialised, badly documented signalling, regulatory problems. Existing devices are plenty and good enough, and can be connected by existing RS232 or Ethernet.
Also use the floppy for simulating DECtape or even paper tape reader/puncher or similar small media.
Signals are simply 17 straight IO pins (2,4,..,34) and 17 GND pins (1,3,..,33) with a little bit of analog stuff. Specialised FDC chips are not advisable, as they fix data format, bad for GCR or non-2^n bit systems.
Signals not investigated yet, but less than 40. (#lookup: ATA signals, after looking at ISA) Sensibly put in 2 ATA connectors, for 2*2 drives, additional signals are most likely just 2 separate select pins.
The original ATA was just an set of bus driver chips coming from the ISA bus, ATA just being an ISA bus subset. So it can share FPGA pins with the ISA bus slots.
Later for higher speed moved to PCI bus with address latch and DMA with cycle combining (2*16 to 32bit). So it can share FPGA pins with the PCI bus slots, preferrably still same layout as ISA as it needs separate address lines and only 16 data lines.
ATA needs separate set of bus drivers, so that ISA and PCI are not hindered by the long ATA cable, so PCI/ISA/ATA need 3 driver sets. Also clocking at ISA 4MHz limits bandwidth to 2*4 = 8MByte/s (original ATA), and at PCI 33MHz limits to 2*33 = 66MByte/s (ATA66) speed operation, so driving ATA at higher clock to get up to full ATA133 also requires separate bus drivers.
Also provide control signals (if any) so that UDMA mode is possible, as ATA is really slow without this (PIO mode). (#lookup: what does UDMA use, is it just PIO with ATA adapter doing the reads, or special DMA control signals)
Serial ATA (SATA) is too new and not widespread enough to make it worth including. Also SATAs announced 150MByte/s transfer rate requires 1200MBit/s and Virtex-E/Spartan-IIE FPGA IOs top out at 625MBit/s. SATA will require Virtex-IIpro style RocketIO to implement, or using an dedicated SATA chip. Better use an PCI card if this is required.
Signals are AFAIK 1/2 of the 68, so 34, the other half being GND or inverted signal. (#lookup: SCSI signals and the various standards)
Up to SCSI-UW there were 2 incompatible single ended (SE) and differential signallings. SCSI-U2W and further up have an 3rd LVD signalling. The oldest 8bit narrow SCSI offers no advantage (and there exist 16 to 8bit converters anyway). [Andrew: This is a mess. Best provide switchable the more common slow SE and the fast LVD, as most modern cards seem to do]
A SCSI card on PCI bus would block that bus during transfers, so this can also share data pins with PCI, ISA and ATA, just needs its own control signals. The same caveats about bus speeds as with ATA apply, and already hitting from 2*40 (SCSI-U2W) operation on. Needs high power bus drivers different to ATA because of driving terminators. (#lookup: SCSI driver chips) So as SCSI also needs its own bus drivers, we need one driver set for each PCI/ISA/ATA/SCSI bus, using common data/address bus FPGA pins.
If an external SCSI-W connector is wanted, run internal cable to an slot. Either second end of cable, which allows hardwired terminator resistors on board, or far better extended first end, which requires removable or better automatic switchable terminator resistors. (#lookup: terminators in driver chips? Tekram DC-390U3W has 3 UCC5930AMWP chips on LVD/SE and 3 DS21T07S on SE, later is by Dallas and are 9bit active SE terminators, former UCC unknown but Dallas has some UCC chips in its "replacement" list)
[Andrew: I know you like SCSI, like I do. But what about doing here? Are you aiming at client or server or embedded or multiple of these uses? What users will want the quite considerable cost of multi-drivers and support stuff? SCSI today is minority stuff. Perhaps make SCSI only available via an CDEC plug-in board (like 2nd/3rd VGA or Ethenet)? Or better go for an commercial PCI SCSI card (leaves CDEC for 2nd/3rd VGA), which is the least work (none), and no cost for the majority of ATA users]
For ATA this gives 2*(2*2) drives, for SCSI 2*15. Costs 2nd full set of independant pins for the 2 buses, drivers and connectors (and SCSI also termination).
Any form of direct tape or cassette interface: Too specialised, even more so than direct HD interfaces. Use ISA, printer/user port, CDEC, etc.
PCMCIA or CompactFlash (CF) slots: Used for memory cards or flash disks or microdrives. Would allow an fully solid state system, compact and noiseless operation, good for embedded systems (board with just CF flash disk and small power supply, for full IO capable X terminal or stationary MP3 player). These are basically yet annother ISA bus variant with an different plug.
Offering only PCMCIA requires an adapter card for putting in CF cards, while offering only CF disallows PCMCIA cards entirely (but CF is today far wider used) and offering both adds cost and layouting work for 2 seldom used sockets.
But it is a complex and costly connector, particularly if one uses one of the ejectable ones. And connector plus slot space takes up quite a bit of room/boardspace. Commercial motherboards do not have them, they are mainly notebook stuff. And there do exist ISA or PCI PCMCIA adapter cards, even ATA ones. Also there exist USB adapters for PCMCIA or CF. And embedded systems can use ATA flash disks or an USB "flash plug".
SSFD/SmartMedia or MemoryStick or Smartcards: No importance outside of data transfer to/from portable devices (such as cameras, MP3 players) or authentification stuff. Use RS232 or USB transfer. Or an USB based card reader, if the device has no interfaces. Or an PCI PCMCIA adapter with then an adapter for them.
FibreChannel or SSA or Firewire or any other fast serial device bus: While these are nice, they are difficult to do. And they have bit rates the FPGA IO can not do, the same case as against SATA. If Firewire (most likely the largest demand) were to go in, it would have to be an specialised chip, like for USB. But unlike USB here PCI cards are easy to get and will stay so for a while, so no reason to design one in. And they are unlikely to be used, as they are high end stuff. If any of these is wanted, use PCI cards.
The programming circuit should be built in (like in QST0201, good feature) with just an connector for the cable to the development system. Ideally the same (or expanded) printer port pinout and signalling as QST0201, to reuse the same configuration/control software.
But the config connector should not be in the ATX connector area, to save scarce space there for user IO ports, and make the FPGA-ness invisible to non-developer users and admins. So the connector should be internal, direct on the board. Preferrably use an 26pin IDC header like that used for pre-ATX LPT slot mount ports for on-board printer ports, not DB25.
If possible from the printer port pin usage, the programming circuit interface should offer support for debuging of the large FPGA and the complex designs it allows. See the software/debug clock 3 signal section above. Ditto remote-forcing the M0-2 mode to "slave" and programming the clock 0 frequency would be good. [Andrew: What capabilities does the QST0201 circuit actually have in this respect, what printer port pins are used up? Same as in the Xilinx Parallel Cable III?]
For seeing configuration happen, put in green "done" and red "error" LEDs. And also parallel them with IDC connectors for missusing the PC case Power and Turbo LEDs for this (these are separate from the "Power/Turbo as output of the user design" IDC connectors, users can plug the LEDs for their preferred function).
For switching external/internal config put FPGA M0-2 pins on DIP switches, like on QST0201, as fumbling and losing jumpers is annoying. Make clock 0 also settable by DIP switches (or use clock chip with internal EEPROM?). Switches should be socketed (DIP socket is cheap), so that they can be replaced with an DIP-on-IDC plug and cable for automated external setting. [Andrew: Socket is actual user request, I second it. Or if they can all be set from the external config IDC 26, then no need for this]
Users may want to have multiple configurations. Or even may want use the FPGA-PC as development system to improve itself, generating experimental configs and needing fallback to working ones. Both of these require config time selection of which config to use.
Simplest is to swap the config EEPROM. This doesn't require any more support than socketing it, so use an PC44 case for it. But this requires chip pulling, and so stresses chips and sockets. And for loading from the running FPGA-PC even pulling the EEPROM used for configuring while the system is running
Possibly include an second EEPROM socket for an 2nd EEPROM for programming it. With 2 sockets make only 2nd writable, or at least the 1st write protectable. Protects main design from any accidents or sabotage.
This should be combined with an switch to select which socket to config from. Also put in IDC connectors to allow such switching from PC case Turbo switch (or Keylock) and triggering reconfig from Reset switch (these are also separate from the "Turbo/Reset as input of the user design" IDC connectors, let user decide on which use). Possibly put in circuits for the FPGA to itself demand an reconfig. This is apparently possible, somehow have some IOs trigger some of the config signals.
Alternatively missuse the parallel port and an external adapter for EEPROM writing, for more access to its socket (but is this needed with EEPROMs?). Such an parallel port EEPROM writer is also sharable with development PC or an other FPGA-PC board (but only small insignificant cost saving). But this may require more complicated switching stuff to config from there, so that other printer/user port usage is not blocked by it. So preferrably not.
Either use an 29Fxxx parallel flash chip, as used for PC BIOSes, which would require an 40pin uC to drive it. Or use an 8pin 45Fxxx serial flash, plus an 8pin uC. The later was mentioned on c.a.f to be <20% of the cost of the same size XC18xx EEPROM, for the 1Mbit case. [Andrew: I note that the Xess Virtex boards both use standard parallel flash (with an CPLD to config), while their older XC4000 used OTP PROMs. They avoid the expensive EEPROMs entirely. We should also]
Either use an minimal 4MBit (= 512kByte) flash chip. Or better put in an larger 16MBit (= 2MByte) flash chip, and offer space for 4 configurations with only one flash chip (and the uC).
Because in later 4 configs space, no multiple flash chips are needed, no sockets or switches. Use the uC address initialisation software to select which config to then use, no hardware for this either. Swapping flash or socketing the flash chip is not needed either. To control selection use 2 switches. Same as above with separate IDC connectors for the PC Turbo switch (or Keylock) and triggering reconfig from the Reset button. No need for FPGA driven self-reconfig either, simply ask the uC to trigger stop and reconfig. This also allows selecting any of the 4 configs by the FPGA, with the right signalling pins.
The FPGA can now also write the configs via the uC (just requires some signalling pins). Use the uC firmware to write protect or allow writing any of the 4 configs, without any hardware needed for this either.
In addition to configuring the FPGA, the uC should also be able to set M0-2, and also control the clock 0 programmable oscillator, and also be able to drive the development system drivable FPGA clock 3 would be nice. So best offer the uC access to the same circuits that the development systems printer port would use. So use tri-stateable pins for this, or better an parallel port driven multiplexer, which can override an badly crashed uC.
[Andrew: I prefer this usage to specialised EEPROMs, more configs, less hardwired, more flexibility, for less cost. Just needs uC software writing, once and then it is done. What could be better?]
Parallel to the interface uC to FPGA for configuring, have also an one from FPGA to uC for run-time downloading of initial or updated firmware to the uC internal flash. The FPGA will initially or after failled uC upgrade need to be configurated from an development systems parallel port. Alternative have the uC only reloadable from the IDC 26 config connector, but this is inferior.
Possibly, run all the Turbo/Reset/Keylock switches from only one set of IDC connectors to the uC, and have an uC controllable IDC/uC switch for the lines to the FPGA. So the FPGA can get these direct or filtered by the uC. Same Power/Turbo/HD LEDs from FPGA to uC and then switch FPGA/uC to the IDC connectors, for filtered output. This allows changing uC vs FPGA usage by reloading uC software, without opening case and replugging. [Andrew: I am not really sure this feature is worth the extra work]
This also allows using any FPGA accessible user IO device to offer the user an selection of configs, as in an boot manager.
As rewriting flash too often (every config!) breaks the chips, possibly put in an single config 4MBit (= 512kByte) SRAM chip, for such temporary configs. Such SRAMs are not available in serial, so need an 40pin uC, so definitely also use an parallel flash chip.
If 256kx18 chips were used for L2 cache or CMOS, then possibly use one of those here, if reducing part list size saves cost. Use this one definitely in 9 bit wide wiring to save uC pins, as only 8bit data are used anyway.
As there is battery backed CMOS SRAM on this board, possibly battery back this SRAM also. This suggests an alternative flash-less design, with only SRAM for 4 configs. But this requires 16MBit of SRAM, which will cost 4 chips, unless 16MBit (2048kx8 or 1024kx18) SRAM chips now exist.
In Virtex family chips this is done by replacing vertical stripes of CLBs and the IOs at the top/bottom of them. This requires the FPGA to give the config circuit the new CLBs and then the config circuit to stop the FPGA, write the stripe, and restart. This requires an uC+SRAM design, config EEPROMs will not suffice for this.
Replacing the CLBs also replaces the IOs above and below them. So possibly arrange IOs so that replacing one of multiple reserved "module" stripe(s) of CLBs does not hit important IOs, such as memory, as this loses their IO FF contents. But such an replacable stripe gives trouble with routing going through it to the IOs on the other vertical side of the FPGA.
This can be solved by sending them around the VersaRing lines in the IOs. But even this requires "saving" the IOs and adding them to the new stripe of CLBs to be loaded. So this must also save the IO FFs. So it looks like partial reconfiguration is an non-issue, as far as IO layout goes. And designers can use any arrangement of replacable CLB sripes.
Loading IO circuits also requires routing from them to the IOs used. So the IOs can not be copied without changes. Same also happens with connecting the circuits to the rest of the design. Replacing just these routes, but not the rest may be a problem for the uC. Pre-routing dummy routes just to the first CLB of the stripe should solve this. Or use global lines from outside into the circuit.
Up until this time it was either: EEPROM without uC (no multi-design space for "reload and boot" logic), or uC with only devices and no flash (no place for an boot manager design to be loaded from, so needs direct device access).
Since coming up with the uc+SRAM combination, all the uC direct access to devices and user IO stuff is not needed any more. But I have left it here in the file, just for you to see what was long intended. Also perhaps there is something interesting in here.
If someone still wants an FEP style design, they could put an "small board" inside the PC case, with an normal 26pin cable from small boards printer port to this boards config connector. They would then need to use SCSI drives with both boards on an single SCSI cable for shared disks. This would require adding SCSI to the small board (by add-on card?)]
FPGA config is actually similar in concept to loading CPU microcode (which is what floppies were invented for), so offer floppy use for this. But also allow config from ATA/SCSI drives for faster performance and more comfort.
Share access to the same devices of these types that the FPGA has access to, so no need for separate devices and the FPGA can place configs there for the uC to then use. This suggests that the uC should actually have pins shadowing the FPGAs floppy and 1st ATA/SCSI pins one-on-one. This requires an large 64pin or even 84pin uC. This also requires the first SCSI to be on board and not on the CDEC.
Only one of FPGA and uC can use the devices at any one time (the other must tri-stat its shared pinse), with the exeption of SCSI where both can share an bus (as switched tri-state is in the bus definition), so long as they do not modify the same file system or use some other exclusion system. The FPGA is tri-stated after power up or reconfig pin, so the uC needs to switch to tri-state before letting the FPGA run.
Facultative, if there are enough pins available on the uC (unlikely), also add control pins to access any other FPGA PCI/ISA bus device. If SCSI is only PCI, then this is needed. In this case, if there is on-board full chip USB, also add access to it. Is good as USB is external (unlike ATA/SCSI), and so allows the use of an pluggable USB disk or even an "flash plug".
Even more facultative, in todays world of Ethernet OS boots, also possibly Ethernet FPGA configs, if on-board full chip Ethernet is included. Also allow access to that, if enough pins and code space in the uC to use it (assuming then an DHCP+TFTP-only single connection UDP/IP client is enough). But this is really over the top, while USB may still be useful.
For selection have standard config to come from fixed ATA/SCSI and the experimental configs from removable floppy or USB, with the ATA/SCSI as fallback. But this is bad for multiple standard configs. Or simply have 1 config = 1 floppy or USB, but this loses using ATA/SCSI HDs, which may already be available.
The simplest method would be an set of 8 DIP switches (again no fumbly jumpers, and also DIP socketed for control remotely), readable by the uC, for selecting the config device type/number, and partition/file number on it, and the fallback policy if an removable media device is empty. Like the external config connector, having these DIP switches externally accessible would cost ATX connector area and show FPGA-ness, so make them internal.
But internal switches make selecting difficult to access on every config, which is often the case while experimenting. Better than DIP switches would be to have the uC offer the user some sort of prompt via the standard user output ports/devices and then take input, auto-configuring on some timeout. This is like an PCs boot manager for selecting BSD or Linux kernels or other operating systems.
Unlike boot managers which run on the same processor and IO as the final OS and kernel will, the FPGA-PC has at this time only an unconfigured "dead" FPGA, not an runnable processor. [Andrew: here the above "boot manager config" hits, making the FPGA usable for fetching and of course also for selection]
This is an equivalent situation to an microcoded processor without microcode loaded. Historically this problem was solved with an separate hardwired front end processor (FEP) computer, like an PDP-11/40 (in the KL-10) or an 8080 microprocessor (in the KS-10). Use the microcontroller, or even perhaps an full microprocessor (uP, such as Z80/Z180) for the FEP, depending on the functionality desired.
To do this, the FEP needs its own access to user I/O. For this also share an subset of the FPGA-PCs input and output devices with it.
For merging FEP and main system output onto the same VGA display (FEP as status line at the bottom, or even as window) have FEP VGA output also go to an FPGA input. This allows an digitiser (like BT848/878 type devices) be implemented, writing into FPGA video memory. This gives both outputs at same time, at full main system VGA quality, on one monitor, but this adds more FPGA design work.
This also allows the FEP to "flip" 2/3/4 screen data sets at 60HZ to the digitizer for having 2/3/4 FEP windows, display updated at 30/20/15Hz. With "flipping" the FEP becomes even more complex.
For getting the larger amount of software needed for all this into the FEP, instead of using one large flash part for firmware, have only an minimal boot ROM and RAM for software in the FEP. Have it boot its software from the same devices that it can later take FPGA configs from. This can be an full OS (such as CP/M if a Z80/Z180 is used). Also use the OSes file system for storing the config files. The FEP has become an full (micro-)computer, like the historical FEP computers actually were.
Possibly make the FEP an facultative, pluggable component. This gives users the choice of simple on-board config by EEPROM (or still uC and devices), or using pluggable FEP for complex/expensive user selectable configuring. [Andrew: How many users do you think would want an FEP? And those that don't want an FEP, how many do you think would still want to use uC+devices vs just go for EEPROMs?]
Easiest variant is to just provide all signals any FEP would want, run to one (or an set of) IDC connectors and leave it at that. This though requires non-EEPROM/device users to make an special FEP computer design, hardwired or FPGA based.
Better offer something to be plugged in as FEP. But that brings back design time cost, even it parts costs are now optional for the user.
Such an FPGA FEP allows any user-selectable FPGA-CPU in it. For Sparcs one may use 8051 or Z80 or even PDQ8. For PDP-10s one could use an PDP-11/40 as in the real KL-10 ones, or even an minimal PDP-10 like the KA-10, as used for the Foonleys. So the FEP FPGA should be no smaller then XC2S150, which can still get by with an small XC18V01 1MBit EEPROM. Sensibly use something derived from the QST0201 design for this.
FPGA configuration includes either FEP firmware in BRAMs (but that is size limited) or give the FEP an bit of external RAM and use its BRAMs only as boot ROM. This would result in FEPconfig(EEPROM) then FEPboot(disk-direct) then main-config(disk-via-FEP) then main-boot(disk-direct) start up sequence. This is fairly involved and quite a bit of (too much?) programming work, but ultimately (and unneccessary?) flexible with little hardware layouting and fixing (definitely good). [Andrew: One of my users says he regards it as too much work for each user to first get/install the FEP FPGA config, then get/install an FEP OS and only then actually be able to get/install the main FPGA design. I regard the "main FPGA from EEPROM" variant as minimal "start immediately" operation, and the FEP as only facultative. But perhaps deliver the FEPs with an simple FEP config and non disk based OS pre-installed in their EEPROMs? This lets the users treat the FPGA FEP as if it were an hardwired uC/uP FEP, or modify it if they want to]
An FPGA FEP allows any complexity FEP IO options, so long as the needed devices are wired in. With QFP208 (and so 140 user IO pins) this allows floppy and PCI/ISA and ATA/SCSI and USB for config file fetching and LEDs&switches and 2*PS/2+VGA+PCspeaker and RS232 for config file selection.
This would also allow serious VGA level output in the FEP, for an good console terminal in the FEP. Also allows the FEP, when not displaying directly, to send data custom formatted to an simpler "digitizer" in the main FPGA.
One possibiliy is inserting the already existing QST0201 board for FEP usage. Then simply wire the various memory, IO devices and storage devices to the QST0201, fitting its 3*40+50 IDC connector pins. This separates out the FPGA and all its development/EEPROM config stuff, and it reduces layout work (but how much?) by reusing QST0201.
QST0201 still needs the large board to provide it with clocks (share the main FPGA ones?), memory (suggested 1 265kx18 as that part is already used for cache) and all the FEP drive and user IO analog stuff (share some?). So this only provides non-optimal (but quite good) facultative part cost.
But the QST0201 has an DB25 for configuring, not an IDC for data from the main FPGA, and needs an power brick plug, not power from some IDC. [Andrew: These are the only weaknesses I have found so far in QST0201. It may be sensible to change them in a future revision of QST0201, to make it an true pluggable component, delivered with DB25 to IDC programming cable and an powerbrick to IDC power cable adapter]
This gives optimal facultative part cost. And side effect of doing this variant is that the FEP IO stuff design and layout time is shared with getting an separate small FPGA-PC board (and so an small+large board series), so double use of the design and layout time.
This page is by Neil Franklin, last modification 2003.06.07