Fast and Compact 16bit Forth Computer
author Neil Franklin, last modification 2008.05.29


memory

32kWord/64kbyte (or potential 32bit variant 1GLong/4GByte)
  byte addresable as Forth does not require large address space
    non-alligned addresses produce an byte swap/rotate as side effect


instruction bundle format

16bit word as bundle of 1+5+5+5bit or 5+5+5+1bit, "3.2 instructions"
  1bit: 0=NOP, 1=CALL
  5bit each: 00000=NOP, 00001=JMP, 00010=BRZ, 00011=DBN, 00100=TIMES, 00101=RET
    JMP/BRZ/DBN with each 1*5=5bit offset/relative, TIMES only next instr
    rest 32-6=26 for data processing instructions
potential 32bit variant 2+5+5+5+5+5+5bit or 5+5+5+5+5+5+2bit "6.4 instructions"
  2bit: 00=NOP, 01=RET, 10=CALL, 11=JMP (JMP = CALL with RET predropped)
    CALL/JMP with only last 3*5=15bit of 5*5=30bit used for addr, 2*5bit instr
  5bit each: 00000=NOP, 00001=JMP, 00010=BRZ, 00011=DBN, 00100=TIMES
    JMP/BRZ/DBN with each 1*5=5bit offset/relative, TIMES only next instr
    rest 32-5=27 for data processing instructions


potential data processing instructions, drop some to fit into 27 or 28

+ - +carry -carry      ; arithmetic (ev - as INV +, ev carry as C->TOS)
-compare               ; test (ev just use - or XOR)
AND OR XOR             ; logic
DUP LIT                ; stack expand
DROP NIP               ; stack reduce (unlikely NIP)
/2 /2carry             ; stack inactive (*2 or *2carry by DUP and + or +carry)
INV NEG BSWAP          ; stack inactive
SWAP                   ; stack incative exept 2nd (build ROT from SWAP)
>R R> R@               ; return stack (ev R@ as R> DUP >R)
>S S> S@++ S@B++       ; memory read
>D D> D!++ D!B++       ; memory write
IN OUT                 ; I/O data transfer


program instruction bundle handling

CALL transfers other 3*5bit into address A15..A1, A0=0 as only word fetches
  allows 32kWord/64kbyte memory, data byte addressed, code always 16b alligned
  potential 32bit variant CALL/JMP transfers 3or4*5bit to A21or16..A2, A1,0=00
    allows 32kor1MWord/128kor4MByte progmem, data byte addr, code 32b alligned
    and leaving first 3or2*5bit for 3or2 data processing instructions

single cycle call processing, direct output addr, then load addr+1 into PC reg
  if 1bit=NOP, then if 3rd/last 5bit is return, and 1st/2nd not R> or >R
    direct TOR as next instruction fetch addr, then addr+1 in PC
  if 1bit=NOP, then if 1st/2nd 5bit is JMP, direct PC+rest5/10bit addr
  if 1bit=NOP, then if 3rd/last 5bit is JMP, next PC fetched 16bit word, long
    alternative for 16bit use an CALL with 16bit after it, simpler but slower
  for BRZ and DBN/TIMES loops, data processing is needed for testing
    fetching an unneeded word while this is unavoidable, will be used if no JMP

literals follow same format as offset for JMP, 5bit (or 10bit) or next PC 16bit
  alternative for 16bit use an CALL with 16bit after it, simpler but slower

memory data read delays instr fetch by min 1.5 word, so make that 2 words
  freeze the current instr word processing until after data read
    while this next instr word arrives, freeze that for later
  do data transfer, no instr word transfered
  then do rest of first instr word, while also no instr fetch, no room
    or re-fetch the already fetches word, if not held in processor
  then do the already fetched 2nd word while fetching next, pipelined

memory data write instr fetch by only 1 word, 3 instr
  freeze this instr word processing until after data write
    while this next instr word arrives, freeze that for later
  start data transfer, and then do rest of first instr word
    no instr word transfered
  then do the already fetched 2nd word while fetching next, pipelined