High Level Assembler - Algorithmic Syntax - Design Ideas author Neil Franklin, last modification 2008.11.22 programming languages like many I have had experiences with multiple languages resulting in liking some and disliking others worst are the "nearly right" ones, as very frustrating started thinking a while ago about ideal language surprised that from abilities and design that is assembler but when using it, lots of typing and lines, little overview all HLLs improve on this, but at cost of all the details, opaque C came the nearest, but lack of usable strings, ANSI not fixing that want the best of both worlds, asm transparence/flexibility, HLL compact language experiences (only relevant ones) pascal (UCSD, later Turbo) + was my first, has structure (proc/if/while) - lots of unnecessary work, bad println assembler (initially Z80, then 6809, 6502, 8051, 68000, 80x86, AVR) + lots of power, details, can do everything, mult independant registers - worst typing conventions, needs labels, processor dependant, 80x86 awfull basic (MS and CBM) + little work, easy, interactive, good for quick jobs, mixed basic+asm - lack of power, line numbers forth (FIG) + is assembler of forth virtual machine, dictionary, interactive, structure - one data stack inferior to asm multiple registers, else best basic+asm rpn (HP calcs) + simple, typing efficient, forth with float and registers, scripts - limited to small calculators occam (just language, ignoring the CSP parallel stuff) + can process multiple calculations in one line - else very basic features like pascal, but no strings c (ANSI) + near to ideal, compact, can do most stuff, beat pascal, long my standard - frustrationg close, lack of strings, can not do all of asm postscript (laserjet module adobe and next display postscript) + like forth but more consequent rpn (if/while) - /var before code and then def, too consequent rpn lisp (any random dialect) + very flexible, can formulate anything, scripts - provides no structure, lost of insane silly parenthesis, visual chaos rpl (HP 28/48 calcs) + is rpn on steroids, mix of rpn+forth+lisp, result somewhat postscript-ish - provides little structure, lots of "noise" characters and typing work shell (sh/bash) + grow from cmd interp, interactive, extensible, scripts, long my standard - awfull syntax, lack of arithmetic, quoting trouble beats c string trouble result ist: liked asm,forth,c,shell but all have defects want mixture: asm ability without mnemonics typing + c compactness and multiple-per-line + forth interactive structured buildup first just multi-format ascii2bin constants convert and write simple parser/generator which doesn't know of instructions parameter bytes either batch, direct to file (better under unix, but no backpatching) or interactive, to memory array, then dump to file (better on target micro) then symbols for constants, those are output as separate constants instructions are just named constants, 256 of these for 16bit instuctions generate per "regexp" symbol names identify sub-symbols and eval then, then insert then symbols for labels, those are used as pair of constants define them using am syntax for "current position" later more complex macro system, that can generate multiple codes at once can take mixed instruction/parameter, spilt/process/merge and output syntax basics random mix of constants and symbolic stuff syntax word for eacht instruction, separated by blank space possibly with blanks inside expressions, blank insensitive parser but requires ; to end/separate instructions, else ; just syntactic sugar comments not with ; any more, use C++ style //, or better unix style # no algol/pascal/C style /* */ inserted comments, seldom, recursion problem no preprocessor, so # can be used, like in most script languages single pass assembler, only knows what has already been defined top-down is a design method, processing source is building and done bottom-up user place "overjump" at begin of source, fill when at start of toproutine or for every subroutine "overjump" to what follows after, slow but OK handling of libraries by either include other source or precompiled binary entire instruction set is defined by an processor specific include file no compiling with "extern" and table with "public" and then linking for unavoidable forward refs define label, insert empty value, then backpatch macros assemble time code execution to analyse parameter types/sizes and generate output fitting for the combination what to do when assembling on different processor than target system input file mix of monitor style commands and constants/code and macro defs monitor style memory change and line assemble and file read/write commands AVR what about 16bit codes (only few 2-word/32bit split, LDS/STS/JMP/CALL) gives 64k instructions and codes, and so needs defining 64k names simply naming all instr works on 8bit code with 256 names, fails here many of these are n 32 Rd or even n 32*32 Rd*Rr variants making 5 or 2*5 register select bits, together with few 11 or 6 bit instr irregular select bit patterns d vs r vs K vs A vs q, or even dd or n 256 (or similar) constant or address or address+bit or offset the n 256 case can be handled same as immediate constants in 8bit instr the R 32 many registers by also being an numeric/alpha constant scan for constant pattern part, extract variable parts as n params for each named pattern a base constant + or | extracted parameters transform the parameters and merge in their bits as side effect params can be variable names instead of register R but then can not use parameters to determine type, how handle double ops Rd=Rr and RPd=RPr are both = with 2 params, same Rd+- K and RPd+- K same on 8080/Z80 for RP+-1 or HL+RP, or on 6502 for ZP vs abs addr have type info in parameters, from where and how their symbol was defined could also be used on other processors with register select bits in instr or even any other variation selections, such as jump conditions and also for multibyte, constant without blank or address/base/offset in [] and on 8086/8088/etc to generate MOD-R-R/M (and SID) bytes from instr name for 2-address machines use an separate "accumulator base select" with Rd this introduces one "stanza" of instr that process that one register then simple 1-address instructions with Rs as only parameter and for real 1-address instructions then an 0-address instruction possibly have one base each for 8bit and 16bit instructions words whitespace ASCII 0..31, Space/32, DEL/127, extended ASCII 128..159, file end any single or series of whitespace chars is one whitespace token non-whitespace any single or series of token chars is one token every token is interpreted and acted on constants numeric/integer system dependant max 8/16/32/64bit, with 8/16/32/64bit length mask length mast defaults to smallest that will fit value can be forced with bwlx after the constant decimal [-][1..9][0..9]* type: none or b for 8bit, w for 16bit, l for 32bit, x for 64bit or have default length set to fit processor type binary 0b[0..1]* octal 0[o][0..7]* hexadecimal 0[hx][0..9A..F]* setbitpattern 0s[0..9A..Za..z@&]* (sets series of numberd bits, rest 0) resetbitpattern 0r[0..9A..Za..z@&]* (resets bits, rest 1, for AND) text character '' or ''ascii-code-any-number-type string "*"[.ascii-code-any-number-type[.""]] counted string 0[bwlx] bufferlength+counted string 00[bwlx] block @@ ...@ symbols anything that is not a valid constant, in particular not starting w 0..9-'"@* suggested for constants or addresses/labels are [a..z][0..9_a..z]*[0..9a..z] anything else is suggested for commands/instructions/macros symbol table separate, or merged in code (dictionary), possibly w "overjumps" table as blocks or as linked list, scan by links or by hash+links using a symbols value full value byte subsets of 16bit value .[lh] .l low bit7..0, .h high bit15..8 shortened 8/16/32bit subsets of value .[bwl] random bit range of value .- random bit section of value .# what about relative distance of address to current address? what about shifting and merging subsets of bits for complex codes? what about random expressions defining a symbols value anything of the form :[,...] where something can be any constant or other symbol current address (for labels) : to set current address :=| to jump current address over unwritten space :+ to allign current address to 16/32/64bit with unwritten/zero :|[wlx] to set and label :=| :+ :|[wlx] what about labels in different memories, code/data/IO, code/const/var/zero? by positional parameter % positional parameters are defined by % in symbol name (before the ":") mov(%,%):... or %=%:... matches symbols mov(blah,bleh) or blah=bleh take anything up to next given char and (sub-)evaluates it first above symbols blah and bleh may be previously def blah:R5 bleh:R6 registers name as or R , flags as F or F accessing registers bitwise as . memory name as single nameless "array", accesses/addresses/modes named: absolute as [] followed by address or [] after positional parameters are supported via register as [] (direct access to register would be ) reg indexed as [+] followed by and [+] followed by or [+] and [+] after positional parameters reg autoinc-/-decrement and stacks [--] [++] [++] [--] 8086 flag selected direction [++] or [--] write as [+-] CPUs without auto-inc/dec make pseudoinstr from [] and +or-1 main data memory (and memmap IO) space named as [] (or explicit as M[]) other memory spaces (program, extern, IO) as [] (P[], E[], IO[]) same also other segments (code, stack), if not implicit, as C[], S[] define symbols by labels in other spaces (current address) :: to switch current space for assembling into :: to switch and set both together ::=|=|+ to switch and set and define ::=|=|+ to define label in other, without altering current :: IO only constant adresses, given by IO address decoders, no alloc no value from current, no current, no settin current :IO: how alloc multiple sections in one space, naming, system layout dependant RAM-from-disk prog/const/init-var vs zeroed-var (only disk space save) but split I and D system prog/const vs init-var vs zeroed-var ROM-and-RAM system prog/const/ev-pre-init vs zeroed-var/ev-pre-inited also register space as R[], regpairs|regbytes as RP|RB[], etc def altern names for registers : or even :R[] same possibly also :M[] to use direct as "variables" 6502 zero page as [], as most often used, absolute as ["] (the " for 2 bytes) or 16bit absolute as [], and auto-use of zeropage where possible or 16bit absolute as [], zeropage only when "named register" syntax, no [] or take 8/16bit address size of [] from label definition accessing memory bitwise as []. followed by . or [.] if parameter extraction is supported operations 1 or 2 char codes for ops 2-operand, D = Destination, S = Source (S left away if constant) add D+S sub D-S adc D+.S sbc D-.S (. = "and a (carry flag) bit") and D&S ior D|S xor D^S cmp(sub-no-store) D?-S test/bit(and-no-store) D?&S lda D=S sto S=D xchg D<=>S reduced 2-operand @D +S -S +.S -.S &S |S ^S ?-S ?&S =S ->S <=>S 1-operand, D = Destination inc D+1 dec D-1 cpl ~D neg -D daa D0..9 shl D<1 shr D>1 rol D<.1 ror D.>1 (. = "and a (carry flag) bit") shra D->1 (- = "with sign") rol8bit D<@1 ror8bit D@>1 (@ = 8bit round and forked off carry) swap D>4|<4 lower nibble stuff 3..0 clr D=0 set D=1 clf FD=0 stf FD=1 reduced 1-operand @D +1 -1 ~ - 0..9 <1 >1 <.1 .>1 ->1 <@1 @>1 =0 =1 program control nop, brk, halt, rep, lock, cs|ds|es|ss jump absolute goto label jump relative goto+ label 8051 AJMP goto| (the | for "split", 5bit PC + 11bit instr) branches ifgoto[+] label to save space no F ev also no "go" in if*goto[+] variants comfort "functionally" named branches if<|<=|=|<>|<=|>goto[+] label decrement and branch ifB-1goto+ AVR conditional skips use ifgoto+1 Z80 *IR and *DR repeated block ops use xxx;ifBC-1repeat subrutines gosub[+|] sub-name (or just subname?) and return reg and stack param load and/or push before, in separate instr embedded constant parameters separate after structured coding and auto labels+jumps loops do|loop..while|until while|until|times..do|loop|repeat for..next cond if..then..else..end|done|all, elseif?, case?, break for in loops? subroutines proc..return..end|done syntax variant comparison for 8080/8085/Z80, 6502, 8051, AVR, later 8086/8088/80186/80188, 80386/etc