Fast and Compact 16bit Forth Computer author Neil Franklin, last modification 2008.05.29 memory 32kWord/64kbyte (or potential 32bit variant 1GLong/4GByte) byte addresable as Forth does not require large address space non-alligned addresses produce an byte swap/rotate as side effect instruction bundle format 16bit word as bundle of 1+5+5+5bit or 5+5+5+1bit, "3.2 instructions" 1bit: 0=NOP, 1=CALL 5bit each: 00000=NOP, 00001=JMP, 00010=BRZ, 00011=DBN, 00100=TIMES, 00101=RET JMP/BRZ/DBN with each 1*5=5bit offset/relative, TIMES only next instr rest 32-6=26 for data processing instructions potential 32bit variant 2+5+5+5+5+5+5bit or 5+5+5+5+5+5+2bit "6.4 instructions" 2bit: 00=NOP, 01=RET, 10=CALL, 11=JMP (JMP = CALL with RET predropped) CALL/JMP with only last 3*5=15bit of 5*5=30bit used for addr, 2*5bit instr 5bit each: 00000=NOP, 00001=JMP, 00010=BRZ, 00011=DBN, 00100=TIMES JMP/BRZ/DBN with each 1*5=5bit offset/relative, TIMES only next instr rest 32-5=27 for data processing instructions potential data processing instructions, drop some to fit into 27 or 28 + - +carry -carry ; arithmetic (ev - as INV +, ev carry as C->TOS) -compare ; test (ev just use - or XOR) AND OR XOR ; logic DUP LIT ; stack expand DROP NIP ; stack reduce (unlikely NIP) /2 /2carry ; stack inactive (*2 or *2carry by DUP and + or +carry) INV NEG BSWAP ; stack inactive SWAP ; stack incative exept 2nd (build ROT from SWAP) >R R> R@ ; return stack (ev R@ as R> DUP >R) >S S> S@++ S@B++ ; memory read >D D> D!++ D!B++ ; memory write IN OUT ; I/O data transfer program instruction bundle handling CALL transfers other 3*5bit into address A15..A1, A0=0 as only word fetches allows 32kWord/64kbyte memory, data byte addressed, code always 16b alligned potential 32bit variant CALL/JMP transfers 3or4*5bit to A21or16..A2, A1,0=00 allows 32kor1MWord/128kor4MByte progmem, data byte addr, code 32b alligned and leaving first 3or2*5bit for 3or2 data processing instructions single cycle call processing, direct output addr, then load addr+1 into PC reg if 1bit=NOP, then if 3rd/last 5bit is return, and 1st/2nd not R> or >R direct TOR as next instruction fetch addr, then addr+1 in PC if 1bit=NOP, then if 1st/2nd 5bit is JMP, direct PC+rest5/10bit addr if 1bit=NOP, then if 3rd/last 5bit is JMP, next PC fetched 16bit word, long alternative for 16bit use an CALL with 16bit after it, simpler but slower for BRZ and DBN/TIMES loops, data processing is needed for testing fetching an unneeded word while this is unavoidable, will be used if no JMP literals follow same format as offset for JMP, 5bit (or 10bit) or next PC 16bit alternative for 16bit use an CALL with 16bit after it, simpler but slower memory data read delays instr fetch by min 1.5 word, so make that 2 words freeze the current instr word processing until after data read while this next instr word arrives, freeze that for later do data transfer, no instr word transfered then do rest of first instr word, while also no instr fetch, no room or re-fetch the already fetches word, if not held in processor then do the already fetched 2nd word while fetching next, pipelined memory data write instr fetch by only 1 word, 3 instr freeze this instr word processing until after data write while this next instr word arrives, freeze that for later start data transfer, and then do rest of first instr word no instr word transfered then do the already fetched 2nd word while fetching next, pipelined