Generating VGA Text author Neil Franklin, last modification 2008.11.27 Connecting to Device connector "HD-15" (correctly named DE15, D-Shell E-size 15pin) pinout 1/2/3 RGB, 4 ID2, 5 GND (for IDs?) 6/7/8 Gnd for R/G/B coax shields, 9 n.c./key, 10 Gnd for H+V coax shields 11/12 ID0/ID1, 13 H-Sync, 14 V-Sync, 15 n.c. (the 9 n.c./key can also be wired as +5V for DDC) (the 13 H-Sync can also be wired as C-Sync) (the 15 n.c. can also be wired as ID3) (the 12/15 ID1/ID3 can also be wired as SDA/SCA for DDC I2C bus) Signals to Generate colours RRGGBB, 64colour (3 * 2bit DACs) 2bit 0.7V*1/3 = 0.233 and 0.7V*2/3 = 0.466V from TTL 5V bit0 75ohm/0.233V*5V-75ohm = 1534ohm -> resistor 1k5ohm bit1 75ohm/0.466V*5V-75ohm = 730ohm -> resistor 680ohm (not 820ohm) needs per DAC 1k5ohm+680ohm @75ohm monitor, then wire to HD-15 pins 1/2/3 max current bit1, 0.466V/75ohm = 6.21mA, fits in ATmega32 PIO pins 40mA or RRRGGBBL (L = G+B common LSB), 256of512colour (3 * 3bit DACs) 3bit 0.7V*1/7 = 0.1 and 0.7V*2/7 = 0.2 and 0.7V*4/7 = 0.4 from 5V TTL bit0 75ohm/0.1V*5V-75ohm = 3675ohm -> resistor 3k9ohm (not 3k3ohm) bit1 75ohm/0.2V*5V-75ohm = 1800ohm -> resistor 1k8ohm bit2 75ohm/0.4V*5V-75ohm = 862.5ohm -> resistor 820ohm needs per DAC 3k9ohm+1k8ohm+820ohm @75ohm, then wire to HD-15 pins 1/2/3 max current bit2, 0.4V/75ohm = 5.33mA, fits in ATmega32 PIO pins 40mA sync H and V, simple 2bit TTL, then wire to HD-15 pins 13/14 treat all ID pins as n.c. or GND, entire middle row 6-10 put to GND, rest n.c. Wiring and Controller Pins RRGGBB 6 bits of an 8bit PIO port, VH 2 further (same or other) PIO bits pixel switching by PIO OUT instructions, 1 OUT per pixel, requires many, fast no time for colour pattern reload, max 1 clock per OUT, so AVR is needed not on PB2..7 (Timer+INT+AnaCompar+SPI) or PD0..3 (UART+INT+Timer) as PD5..7 are SPI ISP interface, and PB0..1 UART debug interface preferably leave PA* free because of ADCs for other use (paddles?) so PC* (T-Osc+JTAG+I2C/TWI) can be used, least important peripherals H and V either use the other 2 PC* bits, V bit7, H bit6, RRGGBB bits5..0 or possibly the other PC* bits sync signals: pixel/2, char begin, draw field or better use them for 8bit/256of512colour in RRRGGBBL (L = G+B common LSB) if they are used for this, other PIOs as sync, as do not change while drawing either PB0..n or PDn..7, as these are already partly used, use sbi/cbi instr Timing of Signals 31.5kHz lines and 70|60Hz (400|480line VGA) frames line @31.5kHz = 31.75us, of this drawn=usable 4/5 = 25.4us frame 400+(12+13+25)=450 or 480+(11+11+23)=525 (=2*NTSC) lines+scanback Display Format desirable 40columnsx25rows (AppleII Atari800 C64 CGA40) as 32columns (TI99/4A MSX) or even 16rows (Dragon32) is a bit too little works with 8line/row font, with: a 4line, A +2line, g +1line, blank 1line [see also http://neil.franklin.ch/Info_Texts/Font_3x6u7_Matrix] allows 25rows @70Hz (VGA 200of400lines) or 30rows @60Hz (VGA 240of480lines) drop every second line, like in CGA or double scan line in CGA-on-VGA Character Set 8bit per character position, so 256 character * colourpair combinations colourpairs mainly vary foreground, if 4 possible 1 for other background also mixed patterns and colourpairs possible such as 1 or 2 times text or cell graphics together with 4 or 8 times 16 2x2 block graphics with each 4 colourpairs, only 64 charater patterns 64 capital-ASCII (32..95 -> 00..63) or with each 2 colourpairs, 128 character patterns 95 ASCII (32..126) + 32+1 cell graphics (0..31+127) or 64 capital-ASCII (32..95 -> 00..63) + 64 2x3 block graphics (64..127) or 95 ASCII (32..126) + 16+1 cell graphics (0..15+127) + 16 2x2 block graphics (16..31) or with only 1 colourpair, 256 character patterns 95+96 ISO-8859-1 (32..126,160..255) + 32+1+32 cell graphics (0..31,127+128..159) or 95 ASCII (32..126) + 32+1+64 cell graphics (0..31+127+128..191) + 64 2x3 block graphics (192..255) Software Basics AVR progr is "PIO OUT-series" draw routines, threaded dode, fixed font in Flash OUT only registers as source, colours as RRGGBB codes in set of regs, "LUT" LUT colourset per line (part-)reloadable, with a bit more frame buffer OUT every pixel independant R0..31, more than 2 LUT registers, = multicolour example underline by bottom line own background LUT reg, switch that per line usable time 25.4us, for 40 char columns gives 25.4/40=0.635us/char 16|20MHz @3clk/Pixel allows 5.333|6.666MPixel/s, 10.16|12.70clk/char this allows the minimal 12clk/char and 4pixel/char at 18.9MHz present 18.432MHz quarz gives 30.72kHz (31.5kHz -2.48%) this requires overclocking on 16MHz, by +15.2%, is acceptable note that 10MHz AT90S have been overclocked to 17.7MHz, +77%, for hours ATmeaga32 16MHz is for entire range 4.5 to 5.5V and -40 to 85deg 2.7V only 8MHz on selected "L", 4.5V 16MHz, present 5.3V ca 20MHz 85deg = 358K, 30deg (room) = 293K, annother *1.22, ca 24MHz for 5 or 6 Pixel/char needs 5/4 or 6/4 this clock, 23.625MHz or 28.35MHz round 23MHz quarz (20MHz +15%) gives 30.666kHz (31.5kHz -2.64%) or present 24MHz quarz (20MHz +20%) gives 32kHz (31.5kHz +1.58%) present 27.1MHz quarz (20MHz +35.5%) gives 30.111kHz (31.5kHz -4.41%) ATmega644 should be able to stand 30Mhz at 5.3V and max 30deg AVR ext clock CKOPT=1 (full XTAL1) CKSEL3..1=111 (high freq) CKSEL0=1 (quarz) SUT1..0=11 (65ms wait until power+clock stable), gives lfuse=FF with only 3clk/pixel is impossible to use direct simulation of hardware shift no time for every pixel shift+branch/skip+time-even+OUT and parallel to this also next character fetch char code, and then pixel data from font must be for each pixel segment an directly coded routine, series of OUTs and time between them converting next-char codes to computed JMP addresses this gives basically an form of threaded code, "execute" character codes needs per char, apart from 4 OUT, LD X+ for load and IJUMP min 2 2clock instr so at least 3clk/pixel, with at least 4 pixel/column gives min 12clk/column and so 4*1clock for 4 OUTs and 4*2clocks for untokenising no time for "end of line" count/loop/exit, requires an "abort special code" for 128 or 256 char, such as ASCII or 2*ASCII or ISO-8895 or ASCII+128Graph use ASCII 0/NUL or better 127/DEL for abort, do 1 OUT blank and leave for 2*64 or 4*64 Upper-ASCII-only must missuse 0/SPACE of last combination for 1*64 reduce them to 1*63 and use the dropped characters place blanks only need first OUT, 3 OUT cycles free for side effects, special codes example in-line "color switch spaces" instead of using colours*chars codes example "reverse video spaces" instead of new colour switch background colour example "underline space" bottom line switch background to foreground colour instead of using an separate colour for bottom line background example cursor display in every second frame, switch back- and foreground at cost of every 2nd frame not displaying the char (and before and after) or draw cursor in unwritten eighth line, but that is a bit small better block/edge replacement, or other colour, or reverse, or blinking triggering line draw by timer, allows free time for buffer updates obvious way, draw H-pulse+HL-blank+line+RETI, gives only HR-blank time but not enough to do reti+work+int+synchronise, and partially used alternative way, draw HL-blank+line+HR-blank, gives H-pulse but also not enough to do reti+work+int+synchronise only large time is while drawing blank lines, HL-blank+"draw"/wait+HR-blank in vertical retrace blanks or if every 2nd line blank as HR-blank timing varies, use timer to return to beginning of that is the smallest of all 3 parts, 1/2 of HL-blank and 1/16 of "draw"/wait timer must be set to only interupt if not in drawing stuff before exiting clear waiting requests, else immediate interrupt must be every line*5/4*(3*160)pixels=600clocks, direct with 16bit Timer1 or 8bit Timer0|2 with prescaler, min 600/256=2.3437, can use 600clks/8=75 spare Timer1 for analog mux+comparator + input capture spare Timer2 for async or own clock stuff, so use Timer0 Character Threading token threading, in SRAM frame buffer 1 8bit token per character OUT, LD X+ ; OUT, MOV, ANDI F0 ; OUT, ANDI 0F, ADD line-high-addr ; OUT, IJMP unthreading just fits into 4 pixels time, allows 12clocks/column, 40char requires 4+6 instructions, so use 16instr/charline blocks, gives F0 and 0F note that this splits char code into non-linear mapping char->address for 5 or 6 pixel/char add 3 or 6 instr, just fits in 16instr blocks charlines/row * max char codes * 16instr = 7 * 256 * 16 * 2 = 56kByte ATmega644 64k allows full 256 char codes, ATmega32 32k only up to max 128 ATmega168 16k allows only minimal Upper-ASCII, 6 * 64 * 16 * 2 = 12kByte ATmega8 8k only fits 32 chars, enough for Hex display and a few more video RAM 25|30lines * (40 chars + 1 abort + n colour-load) = 1025+n|1230+n ATmega32 2k has enough space for nearly 50 rows, scrollback is possible ATmega168 1k can only fit 24 * (40 + 1 + 1), 24 lines, like ADM 3a possibly eliminate blanks at line end, SRAM only until last not-blank after abort code 1byte time-to-wait, allows even more (part-)lines with SRAM only for video RAM, as ring buffer for last so many lines but this introduces "worst case" artifacts, so avoid it if possible on every char added to last line possibly drop oldest line for space alternatively go for 8instr/charline, allows in ATmega32 full 256 char codes use direct threading, no token->address conversion, but load addr in 2 parts OUT, LD X+ ; OUT, LD X+ ; OUT, 1-word-2-cycle-NOP (JMP) ; OUT, IJMP or OUT, 1-word-2-cycle-nop (JMP) ; OUT, LD X+ ; OUT, LD X+ ; OUT, IJMP needs 4+3 instr, theoretic 3pixel/char font, 160/3=53char, but unreadable but must update line-high-addr in all SRAM from line to line, no doubl scan but costs 2byte/char video RAM, so even ATmega32 max 24row as above ATmega168 OTOH doesn't need abbort token, as direct address or exit routine(s) and can also employ techniques of chopping off trailing blank space ATmega644 offers 4k SRAM, for 40x30 despite 2byte, with still space free but there also enough ROM, so only trade-off for which remains more free OTOH because no Tokens also no limit to max 256 char codes, Flash for 512 alternatively 9instr/charline w/o fixed 16instr fields, ATmega32 near 256 char use offset threading, with address added while drawing to line-high-addr or OUT, LD X+ ; OUT, LD X+ ; OUT, ADD line-high-addr, NOP ; OUT, IJMP needs 4+5 instructions, but anyway needed for 4 pixel/char minimal font uses 9Instr, but no 16instr blocks, as that was fast untokening artefact now not needed any more, as adresses are computed when writing SRAM uses 7lines * 256char * 9 * 2 = 31.5kByte for the full 256 chars but gives only 0.5k/256inst rest code, possibly drop chars 0..15|31 but they appear as 7 blocks of [16|32] * 9 * 2 = 288|576byte, 144|288inst even with 8lines * 224char * 9 * 2 = 31.5kByte a lot more than 128 with a bit more space for rest code still 2*96 for ISO-8859 or 2*ASCII allows a large 8|9lines * 256char + 9 * 2 = 36|40.5kByte in ATmega644 with still lots of space available also costs 2bytes/char video RAM, with the same disadvantages alternatively 9instr/charline with code in order instead of order of all above variants each char is compact, with 7*9=63instr, fits 256char*64inst in fully used 32k flexible char number vs 64instr/128byte non-scattered rest splitting in 64k full 256 chars and still full 50% of Flash free or possibly do an 16line font, instead of drop lines or double scan with char address in all top 8 Bits, need to now add line-low-addr also costs 2bytes/char video RAM, with the same disadvantages alternatively if token threaded with top ANDI missing possibl (ROM ignores it?) token threaded fits in 10 instr, which with only 6 lines fits in 60inst/char does not allow 3x6 font with 3x7 for underlines, only 3x5 font w 3x6 underl this way also not-quite-256char and full 1byte 30 lines in ATmega32 2k SRAM alternatively even 16inst token threaded with ordering keeps font data together, easier to generate, and fits 5 or 6 pixel/char this is what in the end got implemented Block Graphics use 64 (or 16) character codes for "grid of blocks" graphic characters with 2x3 or 2x2 blocks/char, and so 2^(2*3)=64 or 2^(2*2)=16 patterns use these to then simulate an very low resolution bitmap/block display graphs only require fairly low resolution, text requires fine pixels this allows low rest graphics, and proper high res text to be mixed 2x2blocks/16patterns * 8colourpairs (8-on-1, or preferably 4-on-2) gives an 2*40x2*[25|30]=80x[50|60] bitmap/block graphic but requires drawing all 8 lines/row, no 7 line tricks, no fully blank line 2x3blocks/64patterns * 2colourpairs (only 2-on-1 is possible) gives an 2*40x3*[25|30]=80x[75|90] bitmap/block graphic but really needs n*3 lines/row, possible with pseudo-VGA 25*9*2=440 lines text then with 7 line font + 2 blank lines between and requires drawing all 9 lines/row, no 7 line tricks, no fully blank line 3x2blocks/64pattern * 2colourpairs (only 2-on-1 is possible) gives an 3*40x2*[25|30]=120x[50|60] bitmap/block graphic more horizontal resolution, and works with only 8 lines/row requires character line segments with 1 out inst per 4 clocks, no problem Bitmap Graphics Mode separate mode which treats the entire SRAM as an 2k bitmap, of 2k tokens no actual text mode, no codes for text, all 8 bits available for pixels no need to fit with 25 or 30 row text format, no scrollback, use full SRAM use SRAM as 100 rows (2lines/row @200lines) of each 20 tokens, "half" with all pixels per token in one line, double long relative to characters or as 50 rows (4lines/row @200lines) of each 40 tokens, "full" with half/half pixels in 2 subrows, same length relative to character this is basically the same as 3x2blocks, but all 8bits used and 50 rows sacrificing text display for more resolution after mode switch each token draws 8pixel/256patterns with 1bpp/2colours, every 3clk one pixel or 6pixel/64patterns+2attribute with 1bpp/4*2colours 1*2 per pixel group pixel groups can be 6x1 or 3x2 depending on above token layouts 6/8=3/4 pixels, gives 4/3 time, only one pixel only every 4clk or 4pixel/16patterns with 2bpp/4colours screenwide only 1 pixel per 6clk, because of intensive bit usage 2k in ATmega32 are 16kbit, allows 160x100 with 1bpp/2colours or 120x100 with 2index+6pixel 4*2colours or 80x100 or 120x68 or 160x50 with 2bpp/4colour with independant pixels 4k in ATmega644 allows even 160x200 2colours, or 120x200 4*2colours or 160x100 in 4colours with independant pixels 8k in ATmega1281 (TQFP64) can even do full 160x200 2bpp/4colours independant this is the full Atari 800 or C64 multicolour bitmap mode 16k in ATmega1284P (DIP40!) allows (@30MHz!) 256x256 2bpp/4colours independant or 160x400 2bpp/4colours independant, with full VGA text 400 lines or use as 2*8k for double-buffering 160x200 2bpp/4colours independant but 320x200 2bpp/4colours fails due to the needed 37.8MHz same also 160x200 4bpp/16colours fails due to unthreading bandwidth Flash use second half as 256 tokens, as all tokens need to be usable up to 24instr/token needs 32instr, 256*64=16kByte, fits top half of ATmega32 possibly problem with how to terminate line drawing, no code for abort token will requre some form of loop, so possibly limited to less tokens/line Sections Graphics Mode apart from low resolution block or bitmap graphics this allows full res up to full [160|320|480]x[100|120|200|240|400|480] variants and that with all 64 colours available anywhere on screen every line must be independant, no character line groups every line needs its own piece of SRAM, so only little SRAM/line is available what gets sacrificed is that there can not be many threaded sections per line in particular the background must offer one large section, full 160Pixel into which one can jump at any time into its 160 pixels entire Flash is a store of line sections, instead of character line groups the main problem will be to define useful ones in an program independant way but for direct-in-AVR individual games private section sets can be used movements by editing section lists, horizontal by changing section lengths vertical by inserting into other lists and modifying lengths direct threaded any amount of sections, but 2byte address per used section no line-high-addr addition needed, analogue to 8instr/char technique no chars, so no min-4pixel limit, no 2-cycle-NOP needed for 4th pixel only 3+3 instructions remaining, allows 8instr min-3pixel sections for sections below 3pixel distance prepend to every sect 2pixel background SRAM 2byte/section, ATmega644 4kByte=2kSect, ATmega32 2kByte=1kSect divide by [200|240]lines gives average 10/8 or 5|4 sections/line for the later better 100|120line mode, for average 10|8 sections also fits better to the 160pixel width, 4:3 ratio, square pixels token threaded max 256 sections, but only 1byte token per section needed ATmega32 already 10/8 sections/line @200line, ATmega168 10/8 in/line @100l if 256 tokens spread over entire Flash (0..n go lost) no line-high-addr adding, so analogous to 6instr/line technique only 1 LD X+, with MOV and only if only 1 ANDI, 3+4 instr, 3pixel/sect ATmega644 128instr/section, ATmega32 64i/s, even ATmega168 32i/s if 256 tokens spread over second half of Flash, some of rest for extensions add high bit, requires full 10instr/section technique, min 4pixel/sect ATmega644 64instr/section, ATmega32 32i/s, ATmega168 still 16i/s, + extend Sprite Graphics 160pixel/2bpp/4colour bitmaps, but of small size, only partial picture but random pixel position (not row bound), independant timing and drawing with 4*2bit pixels, 3 foreground colours and background transparent random width or hight, random number per frame, random positions only limited by internal SRAM for coordinates and bitmap data either draw them by main AVR, on every 2nd frame, instead of text ev instead of an small section of text, rest of lines as properly or draw them by separate hardware, second small AVR as generator per colour LUT register 3bit RGB and 1bit to blanc main AVR by tristate from main AVR interface SPI or 9bit UART, and pixel freq, and H and V sync only draw one sprite at a time, overlap handled by software blanking or overlapping sprites shown in alternating frames, but flicker? sprite usage in system sprite as hardware text cursor, instead of manipulating frame buffer sprite as hardware mouse cursor, pattern only set pixels, no invert backgr or sprite als haircross, replace every 2nd pixel with fitting X= or Y=