6809 assembly
The Motorola 6809 is arguably the most elegant 8-bit CPU ever designed. Released in 1978, it has two accumulators, two index registers, two stack pointers, a program counter relative addressing mode, and the most orthogonal instruction set of any 8-bit chip. Many programmers consider it a joy to work with — a true high-level CPU in an era of limited hardware.
You'll find it in the Dragon 32, TRS-80 Color Computer (CoCo), and the Vectrex console.
Assemblers in the IDE
The IDE uses LWASM (from the LWTOOLS suite) for 6809 assembly, often alongside CMOC for C compilation. The presets for Dragon 32, CoCo 2, and Vectrex wire these up automatically.
The 6809 in a nutshell
The 6809 has a rich register set compared to the 6502 or Z80:
| Register | Size | Purpose |
|---|---|---|
| A | 8-bit | Accumulator A |
| B | 8-bit | Accumulator B |
| D | 16-bit | A:B combined (A is high byte, B is low byte) |
| X | 16-bit | Index register |
| Y | 16-bit | Index register |
| U | 16-bit | User stack pointer |
| S | 16-bit | Hardware stack pointer |
| PC | 16-bit | Program counter |
| DP | 8-bit | Direct Page register (like 6502 zero page, but moveable) |
| CC | 8-bit | Condition Code register (flags) |
Having two 8-bit accumulators (A and B) and a 16-bit D (the pair combined) is incredibly handy — you can do 16-bit arithmetic directly, or use A and B independently. Two index registers (X and Y) mean you can walk two data structures simultaneously without juggling values.
Your first program
; Dragon 32 — print a message via BASIC ROM routine
; Assemble with LWASM
ORG $7E00 ; Load address in Dragon RAM
start:
LDX #message ; X points to message
loop:
LDA ,X+ ; Load byte at X, then X++
BEQ done ; Stop at null terminator
JSR $A282 ; Dragon 32 ROM: print character in A
BRA loop ; Loop
done:
RTS ; Return to BASIC
message:
FCC "HELLO, DRAGON!"
FCB $0D, $00 ; Carriage return, null terminator
END start
Core instructions
Loading and storing
LDA #42 ; Load immediate value 42 into A
LDA $40 ; Load byte at direct page address $40 (like 6502 zero page)
LDA $1234 ; Load byte at absolute address $1234
LDB ,X ; Load byte at address in X into B
LDD #$1234 ; Load 16-bit value $1234 into D (A=$12, B=$34)
STA $40 ; Store A at address $40
STX $1000 ; Store X (16-bit) at $1000
TFR A,B ; Transfer A to B
TFR D,X ; Transfer D (16-bit) to X
EXG A,B ; Exchange A and B
Arithmetic
ADDA #5 ; A = A + 5
ADDB ,X ; B = B + byte at X
ADDD #100 ; D = D + 100 (16-bit add)
SUBA #3 ; A = A - 3
SUBD #50 ; D = D - 50 (16-bit subtract)
INCA ; A = A + 1
INCB ; B = B + 1
INC $40 ; Increment byte at address $40
DECA ; A = A - 1
DECB ; B = B - 1
MUL ; D = A × B (unsigned 8×8 = 16-bit result — a 6502/Z80 programmer's dream)
NEGA ; A = 0 - A (negate)
The MUL instruction is remarkable — a single opcode to multiply two 8-bit values and get a 16-bit result. This alone makes fixed-point game maths dramatically easier than on the 6502 or Z80.
Logic
ANDA #$0F ; A = A AND $0F (mask lower nibble)
ORA #$80 ; A = A OR $80 (set bit 7)
EORA #$FF ; A = A XOR $FF (invert all bits)
COMA ; A = ~A (complement — same as XOR $FF)
LSRA ; Logical shift A right (bit 7 = 0)
ASRA ; Arithmetic shift A right (bit 7 preserved — sign extend)
LSLA ; Logical shift A left (same as multiply by 2)
ROLA ; Rotate A left through carry
RORA ; Rotate A right through carry
Comparisons and branches
CMPA #10 ; Compare A with 10 (sets flags, no result stored)
CMPX #$1000 ; Compare X with $1000 (16-bit compare)
BEQ label ; Branch if equal (Z flag set)
BNE label ; Branch if not equal
BLT label ; Branch if less than (signed)
BGT label ; Branch if greater than (signed)
BLO label ; Branch if lower (unsigned less-than, like BCC on 6502)
BHI label ; Branch if higher (unsigned greater-than)
BMI label ; Branch if minus (N flag set)
BPL label ; Branch if plus
BRA label ; Branch always (short — ±127 bytes)
LBRA label ; Long branch always (full 16-bit offset — any distance)
LBEQ label ; Long branch if equal
The L prefix gives you long branches — no more struggling to keep branches within ±127 bytes. Use BRA/BEQ etc. for nearby targets, LBRA/LBEQ etc. for far ones.
Loops
A counted loop using DECB and BNE:
Or using DBcc — there's no DJNZ like the Z80, but DECB+BNE is just as clean.
Subroutines and the stack
JSR my_sub ; Call subroutine (pushes PC on S stack)
; returns here
my_sub: ; ... do stuff ...
RTS ; Return (pulls PC from S stack)
The 6809 has two stacks: the hardware stack (S) used by JSR/RTS/interrupts, and the user stack (U) you can use freely. This is extremely useful — you can use U as a parameter stack without worrying about conflicts with JSR.
Pushing and pulling
PSHS A,B,X,Y ; Push A, B, X, Y onto S stack (all in one instruction!)
PULS A,B,X,Y ; Pull them back in reverse order
PSHU D,X ; Push D and X onto U stack
PULU D,X ; Pull from U stack
One PSHS instruction can push any combination of registers at once. This makes subroutine prologue/epilogue much cleaner than the 6502.
Addressing modes
The 6809's addressing modes are the most powerful of any 8-bit CPU:
| Mode | Example | Meaning |
|---|---|---|
| Immediate | LDA #42 |
The value 42 |
| Direct | LDA $40 |
Byte at DP:$40 (fast, 2 bytes) |
| Extended | LDA $1234 |
Byte at $1234 |
| Indexed | LDA ,X |
Byte at X |
| Indexed+offset | LDA 5,X |
Byte at X+5 |
| Indexed post-inc | LDA ,X+ |
Byte at X, then X++ |
| Indexed pre-dec | LDA ,-X |
X--, then byte at X |
| Indexed, D offset | LDA D,X |
Byte at X+D (variable index!) |
| PC-relative | LDA label,PCR |
Byte at label (position-independent) |
| Indirect | LDA [,X] |
Byte at address pointed to by X |
PC-relative addressing (PCR) is unique — it lets you write position-independent code that can be loaded anywhere in memory without relinking. Great for ROMs and relocatable routines.
D,X indexed is powerful — you can index into a table with a 16-bit offset computed at runtime, which makes sprite tables and lookup tables very natural.
The Direct Page register
The DP register works like the 6502's zero page concept, but it's moveable. Direct addressing (LDA $40) uses DP as the high byte of the address — so if DP=$20, LDA $40 accesses address $2040.
LDA #$1F ; Set DP to page $1F
TFR A,DP
SETDP $1F ; Tell the assembler about it
LDA $00 ; Accesses $1F00 — 2-byte instruction, fast
LDA $FF ; Accesses $1FFF
On power-up DP=$00, so direct page = $0000–$00FF (like the 6502's zero page). The Dragon 32 and CoCo keep DP=$00 for system variables.
Platform I/O overview
Dragon 32 / CoCo — memory-mapped I/O
; Dragon 32 — set border colour via SAM/VDG registers
; Video Display Generator at $FF00–$FF03
LDA #$08 ; VDG mode register value
STA $FF22 ; PIA 1-B data register — video mode bits
; CoCo — read joystick via PIA
LDA $FF20 ; PIA 0-A: bit 7 = joystick comparator output
Vectrex — vector drawing
The Vectrex is unique — it draws vector graphics, not raster pixels. The 6809 talks to an AY-3-8912 sound chip for audio and a DAC + XY deflection system for drawing.
; Vectrex — draw a point at (0, 0) via BIOS
JSR $F2EB ; BIOS: Wait for beam to finish
LDA #$00
JSR $F3AE ; BIOS: Draw dot at current position
The Vectrex BIOS provides high-level drawing routines — most Vectrex programs call these rather than programming the DAC directly.
Full minimal example — Dragon 32 colour bars
; Dragon 32 — cycle colours on screen using VDG register
; LWASM syntax
ORG $7E00
start:
LDA #$04 ; Colour set A
colour_loop:
STA $FFB2 ; VDG colour register (approximate address)
LDX #$FFFF ; Delay loop
delay: LEAX -1,X
BNE delay
INCA
ANDA #$07 ; Keep in range 0-7
BRA colour_loop
END start
Common mistakes
Confusing A/B order in D — in the D register, A is the high byte and B is the low byte. So LDD #$1234 loads A=$12, B=$34. This trips people up when extracting bytes from a 16-bit value.
Direct vs extended addressing — LDA $40 uses the direct page (2-byte instruction, fast), but LDA $0040 forces extended addressing (3-byte instruction, slower). The assembler usually picks the right one based on your SETDP hint, but be explicit if in doubt.
PSHS/PULS order — PSHS A,B always pushes in a fixed order (B first, then A, regardless of the order you write them). PULS A,B always pulls in the reverse fixed order. The registers are specified as a set, not an ordered list.
Forgetting SETDP — if you change DP with TFR, tell the assembler with SETDP or it'll generate wrong direct-page instructions.
Long branches — the short BEQ etc. only reach ±127 bytes. If you get a range error, prefix with LB: LBEQ.