8bit Accumulator+8Register Computer - 8bit 1Address Instruction Set author Neil Franklin, last modification 2011.05.29 design features registers+ALU 8bit data 1-address system with accumulator, not 2-address 1-operand stuff both on regs, like in 8008/8080/Z80/8048/8051 and also direct on memory, like in 6800/6802/6809/6502 fixes one weakness of Intel/Zylog (no mem), and of Motorola/MOS (no regs) 1 8bit accumulator + 8 8bit registers (not accu-only, not multi-accu+reg) accumulator separate from regs, and no reg sacrifice for (HL) mem access so this gives an full set of 8 8bit regs, like in 8048/8051 OTOH lacking direct reg->reg transfers, are only in 8008/8080/8085/Z80 2 8bit / 1 16bit count register for decrement-and-jump loops like Z80 and 8048/8051 DJNZ, save separate DEC time in particular fast block copy, together with auto-postincrement memory 16bit address 16bit programm counter all branches (conditional) w 8bit offset, like 6800/6802 6809 6502 and fitting also jump (unconditional) variant w 8bit offset, like 6809 all jumps with relative addressing for position independant code conditional/branches also available prefixed in full 16bit, all like 6809 full 16bit absolute and relative data addressing full 16bit stack pointer, not 8bit part-of-memory stack stack pointer separate from general purpose registers, like 8080/8085/Z80 gives 2 more 8bit registers of space, A+8Reg+2Stack+2Progcount 2 16bit address registers (2*8bit register pairs) to prevent slow memory-indirect addressing, unlike 6800/6802 or 6502 but can double as normal data registers, not only-16bit like 6809 can do mult reg pair indirect memory accesses, like in 8048/8051 and 6809 but with full 16bit addresses unlike 8048/8051, and reg pairs unlike 6809 address registers with autoincrement, save separate INX/DEX time like in 6809 registers, and in Z80 LD[ID]/CP[ID], but no LD[ID]R/CP[ID]R set of 16bit address loead/add ops, like in 8080/8085 and Z80 but no increment/decrement, as that is in auto-inc/dec accesses IO space 8bit address separate IO space, to not cut off small stuff from large memory instructions 8bit opcodes hex-friendly 4+4bit format instructions, not 2+3+3bit or even worse format result is mixture especially of 8080/8085/Z80 with quite a bit of 6809 shows what a good design 8080/8085 were, right at begin of the 8bit aera but with various Z80, 8048/8051, 6809 and 6502 extensions this is intended to be the best features of all of these systems together I consider it to be about the optimally possible 8/16bit accumulator design from 8008 base 8bit, accumulator+regs, register block (but 8, not 6), regs naming address register pairs (but 2, not 1, so no 8080 XCHG or LDAX/STAX needed) 8 condition codes for branches (but no conditional CALL/JSR or RET/RTS) from 8080/8085 extensions address register 16bit load and add (but no increment or decrement) stack with S register separate fom other 8bit registers register pair PUSH and POP instructions to save code separate 8bit constant address IO space 1 cycle interrupt lockout after enabling from Z80 extensions DJNZ-style down-counter loops (but 8bit and 16bit count, with name DBN) ED-style prefix to extend instruction set from 6800/6802 base shift/rotate of memory, not just accu (but 4 ops, not all 5) style and partially selection or flags, overflow, not parity, set on LDA (but get rid of the often required 2byte opcode+addr/immed/postbyte size) from 6809 extensions auto-increment/-decrement addressing modes one address register pair usable as second data-only stack jump and branches short and long prog counter relative, with 10-style prfix call also long program counter relative data also long program counter relative use ANF and ORF for setting flags, interrupt enable as flag (not int mask) hex friendly 4+4bit instruction format from 6502 8 memory addressing modes, with immediate as one of them entirely 3 character mnemonic names (but not extreme as on 8008) from 8048/8051 full 8 register block, accu+mem separate, no 64 register-register copies 4+4bit opcode+operand instruction format, with 8 registers and multiple memory modes (but here full 8 memory modes, not 4) replacing n (= [P++]) of 2-operand by A for 1-operand instructions I/O as special adressing mode, not as separate instructions from AVR naming of Y and Z for address register pairs interrupt request as flag (not just interrupt enable) from PDP-8 processor halt as flag own stuff LDA included with arithmetic, but STA separate, diff direction only 4 (2 each) shift/rotate variants, because no space for more IO in random arithmetic, not just LDA/STA, direct AND/OR/XOR processor reset also as flag registers 8bit A accumulator 8bit B C general purpose registers, also usable as DBN counters BC and C 8bit D E general purpose registers, usual place for non-address 16bit data 8bit F G general purpose, also usable as Y secondary address register pair 8bit H I general purpose, also usable as Z primary address register pair Z most often used address, so make this the last 2 general purpose regs normal usage is [Z] destination and 1st operand, [Y] possible 2nd operand 16bit S (SH+SL) stack pointer, dedicated, additional to above general purpose 16bit P (PH+PL) program counter, dedicated 8bit F flags status flags are: 7:Minus 6:Overflow 1:Zero 0:Carry change at: Arith/Shift all MOZC, Logic/Load only MZ, Store/Double none control flags are: 5:ProcReset 4:ProcHalt 3:IntRequest 2:IntEnable setting IntEnable has an automatic 1-cycle delay, allows using POP AF or ORF #$04; *locked*; RTS; *unlocked* for return from interrupt setting ProcHalt halts the processor until interrupt or reset is applied memory 64k*8bit general purpose program and data memory 256*8bit separate IO space, only immediate addressing instructions 8x..Fx: A=A artihmetic instructions: AND/IOR/XOR/LDA/ADD/ADC/SUB/SBC 8x: AND, 9x: IOR, Ax: XOR, Bx: LDA, are logic without carry chain active LDA is special case of ALU , only data pass through, no A relevance this saves the need for ALU/load Mux before A Cx: ADD, Dx: ADC, Ex: SUB, Fx: SBC, are add/subtract with carry chain active ADD/SUB no carry-in (fixed 0/1), ADC/SBC with carry-in (from Carry flag) x0..x7: registers: B C D E F G H I x8: constant: n (= immediate) n (= [P++]) is next instr byte, faster+compacter than [nn] to an constant x9: in/out: IO[n] (= IN) IO[n] has an automatic 1-cycle interrupt lock-out, allows using LDA n; IO[n]; *locked*; STA IO[n]; *unlocked* for interrupt-safe xA..xF: memory: [nn] [P+nn] [Y] [Y++] [Z] [Z++] [nn] (= direct) is compacter and faster than first loading address reg pair and saves using addr reg pair, by using dedicated T (TH+TL) temp reg pair [P+nn] relative to P (= position independant globals) altern [n] zero page, or [S+n] stack offset, or [Y+n] frame/index offset [Z++]/[Y++] use same 16bit addressbus+1 increment method as P++ uses is compacter and faster code than additional INX/DEX Y/Z instr and no flags modified by the ++, just like if special INX/DEX Y/Z and no opcode space used up for separate INX/DEX Y/Z alternativ replace [Z++] with [--Z], as norm LDA [Z]; OP [Y++]; STA [Z++] allowing LDA [--Z]; STA [--Y] for overlapping upwards copying alternativ replace [nn] and [P+nn] with [--Y] and [--Z] for both dirs flags: all set M(minus) and Z(zero) ADD/ADC/SUB/SBC also set C(carry) and O(verflow) 4x..7x: = shift instructions: SHL/ROL/SHR/ROR are shift and rotate to left and right these on 4x..7x parallel the ADD/ADC/SUB/SBC on Cx..Fx SHL/ROL by using ADD/ADC of twice same data, so are CX/Dx reduced by 8x SHR/ROR by different mechanism, mux in ALU carry XOR, fill rest SHR is arithmetic shift, with MSB sign retained in bit7 faster to delete bit7 for logical, than duplicate bit6 for arithmetical ROL/ROT are 9bit, including carry into bit0 or bit7, bit7 or bit0 to carry possible to merge carry for 8bit, but imposs to recreate loss for 9bit 4x: SHL, 5x: ROL, 6x: SHR, 7x: ROR, all with carry chain active SHL/SHR no carry-in (fix 0), ROL/ROR with carry-in (from Carry flag) x0..x7: registers: same as above x8: accumulator: A n (= [P++]) immediate is nearly senseless, and fails when code in ROM so use x8 for A, replace doing STA r; shift r; LDA r or STA r; ADD/ADC r x9: in/out: IO[n] shifting IO registers is little use, possibly recycle for other stuff xA..xF: memory: similar to above, except: [Y++] is seldom used here, replace by [Y--] or [--Y] for SHR/ROR long shift [--Y] is preferable, same circuit as [--S], set to begin+length (no -1) for all non-register use an TD temp data register, not TH+TL as addr in-use flags: all set M(minus) and Z(zero) and C(carry) and O(verflow), ADD/ADC-like 3x: =A store instructions: STA are store, placing these on 3x parallels the LDA on Bx reduced by 8x but store separate from load, even though usually logically grouped with it as it is totally different operation, no changing of A or flags rather changing of other registers or memory 3x: STA, just tristate gate to get A out to data bus and store x0..x7: registers: same as above x8: accumulator: A (= NOP) n (= [P++]) immediate is nearly senseless, and fails when code in ROM so use x8 for A, like shift, gives STA A, changes nothing, no mem or flags x9: in/out: IO[n] (= OUT) xA..xF: memory: similar to above, except: [Y++] seldom used here, replace by [Y--] or [--Y] for STA string reversal [--Y] is preferable, same circuit as [--S], set to begin+length (no -1) with LDA [Y++] allows using Y as second/separ data-only stack pointer if w LDA [--Z] allows overlapping upwards copying instead of reversal flags: does not set any, not generating anything, not changing registers this is a further reason to not have STA in with LDA, behave very different 2x: =[+] double byte (address) instructions: DAZ/DLS/DLY/DLZ are double register (=16bit) add to Z and load to S/Y/Z are an small set of 4*4 2-byte address instructions, x split into 2*2 bits 20: DAZ, double adds to Z this is good for adding offsets into data structures or arrays only space for one register pair, so for morst often used Z 24: DLS, 28: DLY, 2C: DLZ, double loads to S or Y or Z DLS is the only way to load something into S, setting the stack pointer +00: from 16bit constant: nn (= immediate 2 bytes) this is used to set addresses for loop begins, and to set up stack pointer +01: S, 02: Y, 03: Z, from address register pairs 20+03=23 is DAZ Z, is SHL Z, 16bit shift left without flags all 24+01=25 DLS S, 28+02=2A DLY Y, 2C+03=2F DLZ Z, change nothing (= NOPs) DL? S is the only way to read content of S, save the stack pointer to move S to (de)allocate local vars do DLZ S; DAZ offset; DLS Z flags: do not set any, as working on addresses, do not disturb data flags 1x (10..19): [--S]= and =[S++] stack instructions: PSH/POP are push to stack and pop from stack register pairs do register pairs to reduce opcode count, code size, and exec time when space is needed, handling 2 regs a time is sensible, not a problem 10: PSH, push register pair to stack [--S] needs an special decrementer, before address being output can be done with ALU in invert operand and carry clear mode 11: POP, pop register pair from stack [S++] uses same 16bit addressbus+1 increment method as P++ uses, twice +00: B+C, +02: D+E, +04: F+G(Y), +06: H+I(Z), general registers +08: A+F, special registers flags: does not set any, not generat anything, not changing A (exept POP AF) 1x (1A..1B): [--S]=P; P=P+nn and P=[S++] subroutine instructions: JSR/RTS are long relative call to subroutine and return from subroutine 1A: JSR, push P to stack and add to P full 16bit offset (relative!) so position independant code is possible for few cases requiring absolute JSR use an JSR to an absolute JMP for an absolute JMP use code DLZ nn; PSH Z; RET 1B: RTS, pop P from stack, analog to reg pair pops flags: does not set any, not generating anything, not changing registers 1x (1C..1D): flags instructions: ANF/ORF are AND/clear and OR/set flags 1E: ANF, Flags=FlagsAND, clear immediate bit pattern of flags n (= [P++]) is next instr byte 1F: ORF, Flags=FlagsOR, set immediate bit pattern of flags n (= [P++]) is next instr byte flags: set/clear specified ones, directly, not from calculation 1x (1E..1F): if (=-1)!=0 then P=P+n loop instructions: DBN are decrement register C or BC and branch if nonzero (= remaining rounds) no D E F G H or I possible, 2 registers already give one 16bit counter is compacter and faster code than separate DEC/DEX C/BC and BZC instr and no flags modified, just like if special DEC/DEX C/BC and no more opcode space used up then for separate DEC/DEX C/BC 1E: DBN, while nonzero add to P sign extended 8bit offset, +127/-128 +00: BC, 16bit count register pair +01: C, 8bit count register or alternative 2 separate 8bit, use both for 16bit: +00: B, (high) 8bit count register +01: C, (low) 8bit count register flags: does not set any, as working on counters, do not disturb data flags 0x: if = then P=P+n jump/conditional instr: NOP/JMP/B[PRSCZOM][CS] are branch if flag 1/ZorC/MxorO/Zor(MxorO)/Carry/Zero/Overflow/Minus has value of Clear,False,0 or Set,True,1 fff=000(1) and val=0 does nothing/never, is NOP fff=000(1) and val=1 does unconditional/always, is JMP (is not a branch!) fff<>000 and val=any gives 7*2=14 BRA variants 00: Bfv, Branch if the flag f has the value v add to P sign extended 8bit offset, +127/-128 for few cases requiring absolute JMP use code DLZ nn; PSH Z; RET for 16bit relative use code JSR +0; POP Z; DAZ nn; PSH Z; RET +00: f, flag = 1 (=fixed1), +02: ZorC (=Positive/unsigned) +04: MxorO (=Range/signed), +06: Zor(MxorO) (=Sign/signed) +08: C (=Carry/unsigned), +0A: Z (=Zero) +0C: O (=Overflow), +0E: M (=Minus) +00: v, value = C(lear)(False/Zero/0), +01: S(Set)(True/One/1) gives Bfv = NOP/BPC/PRC/PSC/BCC/BZC/BOC/BMC JMP/BPS/PRS/PSS/BCS/BZS/BOS/BMS altern name .../BHI/BGE/BGT/BHS/BNE/.../... .../BLS/BLT/BLE/BLO/BEQ/.../... flags: does not set any, not generating anything, not changing registers xx (reuse NOPs at 00 or 25/2A/2F or 38): processor control instructions: PFX are prefixes for 256 more instructions, possibly multiple times xx: PFX, prefix to switch to further instructions, extended instruction set replace JMP and B[*][CS] and DBN with long 16bit offset add P=P+nn full 16bit offset (relative!), same as in JSR so position independant code remains possible replace long relative JSR with absolute P=nn 16bit JSA replace D[*] with destination/source Z(=HI) with Y(=FG) or X(=DE) versions replace SUB and SBC with nondestructive CMP and CPC replace SHL and SHR with INC and DEC replace ROL and ROR with INV and NEG replace STA with XCG (for semaphores?) add complex stuff such as MUL or DIV/MOD or MAC or scale (MUL+DIV) or units like FPU, string unit or bignum support or MMU control or system control or on-CPU timer(s) or full threading until any of this is of interrest, leave them as an reserved/NOP opcodes instruction code table (only opcodes, in order: ver: bit7..4/3, hor: bit2..0) + 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 00 NOP(B1C) JMP(B1S) BPC BPS BRC BRS BSC BSS BCC BCS BZC BZS BOC BOS BMC BMS 10 PSH BC POP BC PSH DE POP DE PSH Y POP Y PSH X POP X PSH AF POP AF JSR RTS ANF # ORF # DBN BC DBN C 20 DAZ # DAZ S DAZ Y DAZ Z DLS # NOP(DLS) DLS Y DLS Z DLY # DLY S NOP(DLY) DLY Z DLZ # DLZ S DLZ Y NOP(DLZ) 30 STA B STA C STA D STA E STA F STA G STA H STA I NOP(STA) STA out STA [nn] STA [Pn] STA [Y] STA [-Y] STA [Z] STA [Z+] 40 SHL B SHL C SHL D SHL E SHL F SHL G SHL H SHL I SHL A SHL i/o SHL [nn] SHL [Pn] SHL [Y] SHL [-Y] SHL [Z] SHL [Z+] 50 ROL B ROL C ROL D ROL E ROL F ROL G ROL H ROL I ROL A ROL i/o ROL [nn] ROL [Pn] ROL [Y] ROL [-Y] ROL [z] ROL [Z+] 60 SHR B SHR C SHR D SHR E SHR F SHR G SHR H SHR I SHR A SHR i/o SHR [nn] SHR [Pn] SHR [Y] SHR [-Y] SHR [Z] SHR [Z+] 70 ROR B ROR C ROR D ROR E ROR F ROR G ROR H ROR I ROR A ROR i/o ROR [nn] ROR [Pn] ROR [Y] ROR [-Y] ROR [Z] ROR [Z+] 80 AND B AND C AND D AND E AND F AND G AND H AND I AND # AND in AND [nn] AND [Pn] AND [Y] AND [Y+] AND [Z] AND [Z+] 90 IOR B IOR C IOR D IOR E IOR F IOR G IOR H IOR I IOR # IOR in IOR [nn] IOR [Pn] IOR [Y] IOR [Y+] IOR [Z] IOR [z+] A0 XOR B XOR C XOR D XOR E XOR F XOR G XOR H XOR I XOR # XOR in XOR [nn] XOR [Pn] XOR [Y] XOR [Y+] XOR [Z] XOR [Z+] B0 LDA B LDA C LDA D LDA E LDA F LDA G LDA H LDA I LDA # LDA in LDA [nn] LDA [Pn] LDA [Y] LDA [Y+] LDA [Z] LDA [Z+] C0 ADD B ADD C ADD D ADD E ADD F ADD G ADD H ADD I ADD # ADD in ADD [nn] ADD [Pn] ADD [Y] ADD [Y+] ADD [Z] ADD [Z+] D0 ADC B ADC C ADC D ADC E ADC F ADC G ADC H ADC I ADC # ADC in ADC [nn] ADC [Pn] ADC [Y] ADC [Y+] ADC [Z] ADC [z+] E0 SUB B SUB C SUB D SUB E SUB F SUB G SUB H SUB I SUB # SUB in SUB [nn] SUB [Pn] SUB [Y] SUB [Y+] SUB [Z] SUB [Z+] F0 SBC B SBC C SBC D SBC E SBC F SBC G SBC H SBC I SBC # SBC in SBC [nn] SBC [Pn] SBC [Y] SBC [Y+] SBC [Z] SBC [Z+]