Microcode Style Computer - Sequencer + Microcode + RegALU author Neil Franklin, last modification 2011.08.23 Derived from basic idea of AMD 2900 and TI 74S481 bitslices allowed using a few highly integrated chips, fast, an without CPLD/FPGA using only standard ROM/PROM/EPROM/EEPROM/Flash for configuration which are also needed for firmware, programmer device is available using same ROM/PROM/EPROM/EEPROM/Flash for firmware and standard SRAM chips for working memory, enough for small systems entire microcode (and bitslice) design method has disappeared from the market as result of ever faster fixed function CPUs and programmable logic but the first are limited and the later are more difficult and not open also with later CPLDs are too small and FPGAs compilation is too slow single quantity VLSI chips are now possible at low cost, using deprecated fabs 350nm from 5mm^2, gives 5*15'000 = 75'000 gates or 300'000 transistors amounts to roughly an 80386 or an 68030, about twice early MIPS/SPARC/ARM should be enough for any decent hobby grade processor project, CPU or SoC amounts to roughly an 64kbit SRAM, the standard 8kx8, early/mid 1980s or assumed to be about 128 or 256kBit EPROM/EEPROM/Flash, 16or32kx8 150nm from 4mm^2, gives about 3 times more, but also costs similarly more would be needed of standard EPROM/EEPROM/Flash or SRAM disappear also could be duplicated with this, allowing 32kx8 SRAM or 64or128kx8 ROM can have them in DIPS64 (2mm) or DIP28 (2.54mm 600mil) or PLCC/JLCC68 cases but turnaround time is large, and designing a lot of work, few can do it so use this to get back to the microcode (and bitslice chip) design type as anyone can learn to use that, and is modifiable in microcode software this also revives an historic system which was successfull before standardisation drove out all alternatives and experimentation and can also be used for hardware emulation of any disappeared old system aim is 1 or 2 special chips, plus a few standard EPROM/EEPROM/Flash and SRAM entire small system CPU+mem should be able to fit on one 160x100mm PCB only IO and power would then possible need an second PCB or an further third PCB if large size separate memory is wanted RegALU chip basic design anything that was in an set of 2903s plus internal 2902 the result will be nearer to an 29102, not really bitslice any more but it will preserve and use the microcoded design bitslices were used for aim for an DIPS64 for flexible, but also allow an DIP28 subset for space ALU full 32 functions, 16 logic and 16 arithmetic, with carry switch at least 8bit wide, but with minimal chip size width possible 16/32/64bit have an 4/5/6bit width selector for exact any 1..n bit, also queer ones 16bit also allows 12bit, 32bit also allows 18/20/24bit 64bit even allows 36/38/48/60bit system to be (re-)implementend possibly also carry breaking into bytes, but requires many signals would have to be an set of internal FFs, and an on/off control input or just an repeated use of above bit selector, cut carry into portions possibly have 2 selectors, for maxbit and bytesize settings possibly also BCD support, requires one signal, but complicated carry an definitely limits carry lookahead to 4bit bytesize sections no P and Q outputs, as only one or few chips, block ripple carry is enough and so no need for making an external 2902 or 74S82 chip inside use carry lookahead on 4 or 8bit chunks, better 8, ist sqrt(64) carry input generator with 0/1/C/NC modes, or off/C/0/1 modes or implement this in the sequencer chip, as outside of the data path set of Carry, Overflow, Minus status outputs, selected by bit width also zero detector and possibly parity generator, and status outputs all status outputs also with facultative flag FFs, replace 2904/2914(?) flags should be in RegALU, to reduce delay in time critical carry out possibly also after flags ZorC and MxorO and Zor(MxorO) combinations or place this in sequencer chip, or even in custom logic in between chips Shifter 1+2+4+... bits behind each other second Shfter and Q register for double wide multiplication and division also whatever other special registers are needed for Mul/Div/Sqrt use loop mechanism in sequencer to run full operation shift carry uses same 4/5/6bit width selector as ALU, both shifters with "end" multiplexers for rotating modes, and using second shift extension or implement this in the sequencer chip, as outside of the data path status output capable of being separate from ALU carry for flexible but also allow facultative merging with carry, for less external logic Masker AND for extracting stuff, and twice plus OR gives inserting together with shifter also used for part-word bytes also uses same 4/5/6bit width selector as ALU, in addition to own same also available on second shifter, for 2*n, and parts of both Registers large main register file 32..256, poss also usable for 2 Forth stacks or for 32bit implemented in 4*(16..32) 8bit chunks (unless wide internal) therefore no need for register expansion, no need for 3 data ports read 2 (or 3) ports, write 1 (or 2) ports, at least one r/w independant addr or split into 2 separate blocks for less address ports possibly address generators, register, stackptr, possibly FIFO most likely limited to max 3 external addresses, less not desirable may require reducing amount of addressbits for this to fit second smaler aux register block, 16..64, for "special" registers, such as PC only with external addresses, but with separate read and write ports allows taking from Link, and storing incremented to PC or perhaps just one stackpointer, for Forth Return stack, any CPU HW stack in combination with 2nd read/write ports, or in 2 parts with own ports with own small ALU, pass/add/inc/dec, independant of main ALU allowing postinc, postdec, predec, index modes aims to be replacement for 2920 PC/MAR chip, but in main chip this is mainly indended for address registers and calculations can be separate PC or Link or SP, or data or address temp, or refresh can also be used for DMA addresses, but not counter/control stuff here also not data path, use small amount of external logic for that such as for video output, ext pixel shifter, clock/load from Sequencer Extras for making IO direct in chip, save external chips possibly an further shifter for Video pixels, driven by own clock possible further shifters for Disk or Network or even just SPI reduce processor speed, but DMA does that anyway, as memory is bottleneck this is analog to the Xerox Alto, having the processor timeslice DMA uses the above second smaler register block and its small ALU also no counter/control stuff here, also no sync signal generators Data pins set of 32bit, splitable for aux/address and main/data allows up to 24A+8D ("byte") or 16A+16D ("word") separate or up to 32AD ("long") multiplexed for external memory bus any 64bit (or over 32bit) support will require halfwords any /ALE or Clk signals come from sequencer, not from RegALU pins all data input and output goes through entire ALU+Shifter+Masker addresses cascaded von aux regs/ALU, then with main regs/ALU, then to pins must also allow use of addresses before ALU incrementing, for postinc outside direct to memory or to 373 latches or to 273/374 registers for DIP28 subset only 8AD multiplexed for external memory bus the multiple /ALE or CLK and Byte addresses needed are not from RegALU pins Control pins rest of chip-power-data, about 25..30 left then a lot of pin space left, some for on chip register addresses at least 2 (or 3) for main registers (possibly reduced 6bit) and only 2 for aux registers (possibly reduced 3bit) and rest for control lines and carry/shift/status if not needing that many pins, add further 8 to 16bit address pins but preferably multiple/many separate register addresses, flexible or even go for more separate control pins, not decoded "instructions" more flexible, but uses wider microcode or demuxes/PALs after microcode all address/control inputs also with facultative pipelining FFs this allows using standard non-registered EPROM/EEPROM/Flash for microcode for DIP28 subset reduce to 1 5or6bit main registers and 1 3bit aux reg addr reduce carry/shift/status lines with multiplexing, only few out possibly allow facultative decoded "instructions" with fewer signals unlikely multiplex some signals, if anthing then adresses, but is slow also offer all this reduction in DIPS64, only for narrower microcode ROMs one chip, 2 cases, switch pin/pad, for DIP28 mode n.c. to save pins Sequencer chip do this as second chip, as one can also use PLAs/PALs/EPLDs/CPLDs for driving or more likely microcode EPROM/EEPROM/Flash, with PLAs/PALs as sequencer basic design anything that was in an set of 2909/2911 plus 29811 plus 29xx the result will be nearer to an 2910, one monolithic sequencer but can also include all the loop count and case switch stuff and can also include all the status and DMA select stuff, enough pins Address generator/bus wide 16bit, 64kWord limit (of 28pin EPROM/EEPROM/Flash), decoder for internal 2*32kWord splitting (28pin E/EE/F and 28pin SRAM) aim to use the very common 32kx8bit chips 29F256 and 62C256 and use the same also for main menory, here mult wide, there mult long for splitting address space offer 1:2 Mux, driving 2 /CS outputs but for large 64k ROM also allow 1 /CS and 16th address bit also if enough pins decode for external memory, use RegALU top adressbit(s) and offer 1:2 or 1:4 or even 1:8 decoders, for multiple memory banks allow pins for writing signals to microcode SRAM also on control pins coming from microcode allow register to send data this same must also be present in the RegALU chip, for its section Address count/load/branch/sub/return circuits or use no counter, direct load always, no flag/mode, branch as 2-switch for branch/2-switch multiple flags to select from, external and internal external for connecting to RegALU Carry/Overflow/Minus/Zero/Parity internal from mutliple counters, can be used for looping or DMA control should be cascadable via microcode, for 6845 style usage for n-switch larger multiplexer, usable for opcode interpretation Multiple threads, with priority encoder, no stack ptr, just priority level aim for an Xeroy Alto style multiple microcodes, but in one register space no need to save a few microcode bits with register space switching switching uses same mechanism also used for DMA, all DMA at higher prio should do entire video loading, incl address reloads for char based possibly use multiple DMA threads for line/retrace/frame Control pins same basics and features as in RegALU all address/control inputs also with facultative pipelining FFs this allows using standard non-registered EPROM/EEPROM/Flash for microcode also if enough pins provide pipelining FFs for random external logic such as driving RegALU adresses to external latches/registers or for driving DMA control counter load, or video pixel shifter load Case and Status and DMA (2940) and prio select possible many, share with instruction register size reduction Sequencer extension and/or merging Instruction register pins register internal, only for picking out opcode for interpreting own shifter/masker for extracting address for microcode entry load this into microcode address, with n-switch, not by 2900 load table either extract register addresses here, for wiring to RegALU address pins or instruction also stored into RegALU, for extracting addresses there these address shifters/maskers, then driven from instruction register unlikely send these all through microcode addresses or data bus slow serial one adress at a time defeating multiport registers speed else would require instruction register and field extraction external more flexible, but external logic, and hardwired extraction if sequencer is too small, unused space, also possibly add Flash ROM to chip this may be only second generation, extension of ROM-less or as sequenzer is second chip anyway, right from the beginning both ROMs to drive self, and to drive RegALU, and external stuff chip with internal Flash for configuration, needs programming, use SPI that is standard, can use existing adapters such as AVR and not complicated, easy docs, easy software to write as last step go for merging both RegALU and sequqncer, single Chip but still a pure microcode design, not a (C)PLD or FPGA, not logic and as such easy to understand by normal programmers