Skip to content

6809 assembly

The Motorola 6809 is arguably the most elegant 8-bit CPU ever designed. Released in 1978, it has two accumulators, two index registers, two stack pointers, a program counter relative addressing mode, and the most orthogonal instruction set of any 8-bit chip. Many programmers consider it a joy to work with — a true high-level CPU in an era of limited hardware.

You'll find it in the Dragon 32, TRS-80 Color Computer (CoCo), and the Vectrex console.

Assemblers in the IDE

The IDE uses LWASM (from the LWTOOLS suite) for 6809 assembly, often alongside CMOC for C compilation. The presets for Dragon 32, CoCo 2, and Vectrex wire these up automatically.

The 6809 in a nutshell

The 6809 has a rich register set compared to the 6502 or Z80:

Register Size Purpose
A 8-bit Accumulator A
B 8-bit Accumulator B
D 16-bit A:B combined (A is high byte, B is low byte)
X 16-bit Index register
Y 16-bit Index register
U 16-bit User stack pointer
S 16-bit Hardware stack pointer
PC 16-bit Program counter
DP 8-bit Direct Page register (like 6502 zero page, but moveable)
CC 8-bit Condition Code register (flags)

Having two 8-bit accumulators (A and B) and a 16-bit D (the pair combined) is incredibly handy — you can do 16-bit arithmetic directly, or use A and B independently. Two index registers (X and Y) mean you can walk two data structures simultaneously without juggling values.

Your first program

; Dragon 32 — print a message via BASIC ROM routine
; Assemble with LWASM

        ORG $7E00       ; Load address in Dragon RAM

start:
        LDX #message    ; X points to message
loop:
        LDA ,X+         ; Load byte at X, then X++
        BEQ done        ; Stop at null terminator
        JSR $A282       ; Dragon 32 ROM: print character in A
        BRA loop        ; Loop
done:
        RTS             ; Return to BASIC

message:
        FCC "HELLO, DRAGON!"
        FCB $0D, $00    ; Carriage return, null terminator

        END start

Core instructions

Loading and storing

LDA #42         ; Load immediate value 42 into A
LDA $40         ; Load byte at direct page address $40 (like 6502 zero page)
LDA $1234       ; Load byte at absolute address $1234
LDB ,X          ; Load byte at address in X into B
LDD #$1234      ; Load 16-bit value $1234 into D (A=$12, B=$34)
STA $40         ; Store A at address $40
STX $1000       ; Store X (16-bit) at $1000
TFR A,B         ; Transfer A to B
TFR D,X         ; Transfer D (16-bit) to X
EXG A,B         ; Exchange A and B

Arithmetic

ADDA #5         ; A = A + 5
ADDB ,X         ; B = B + byte at X
ADDD #100       ; D = D + 100  (16-bit add)
SUBA #3         ; A = A - 3
SUBD #50        ; D = D - 50   (16-bit subtract)
INCA            ; A = A + 1
INCB            ; B = B + 1
INC $40         ; Increment byte at address $40
DECA            ; A = A - 1
DECB            ; B = B - 1
MUL             ; D = A × B  (unsigned 8×8 = 16-bit result — a 6502/Z80 programmer's dream)
NEGA            ; A = 0 - A  (negate)

The MUL instruction is remarkable — a single opcode to multiply two 8-bit values and get a 16-bit result. This alone makes fixed-point game maths dramatically easier than on the 6502 or Z80.

Logic

ANDA #$0F       ; A = A AND $0F  (mask lower nibble)
ORA  #$80       ; A = A OR $80   (set bit 7)
EORA #$FF       ; A = A XOR $FF  (invert all bits)
COMA            ; A = ~A (complement — same as XOR $FF)
LSRA            ; Logical shift A right (bit 7 = 0)
ASRA            ; Arithmetic shift A right (bit 7 preserved — sign extend)
LSLA            ; Logical shift A left (same as multiply by 2)
ROLA            ; Rotate A left through carry
RORA            ; Rotate A right through carry

Comparisons and branches

CMPA #10        ; Compare A with 10 (sets flags, no result stored)
CMPX #$1000     ; Compare X with $1000 (16-bit compare)
BEQ  label      ; Branch if equal (Z flag set)
BNE  label      ; Branch if not equal
BLT  label      ; Branch if less than (signed)
BGT  label      ; Branch if greater than (signed)
BLO  label      ; Branch if lower (unsigned less-than, like BCC on 6502)
BHI  label      ; Branch if higher (unsigned greater-than)
BMI  label      ; Branch if minus (N flag set)
BPL  label      ; Branch if plus
BRA  label      ; Branch always (short — ±127 bytes)
LBRA label      ; Long branch always (full 16-bit offset — any distance)
LBEQ label      ; Long branch if equal

The L prefix gives you long branches — no more struggling to keep branches within ±127 bytes. Use BRA/BEQ etc. for nearby targets, LBRA/LBEQ etc. for far ones.

Loops

A counted loop using DECB and BNE:

        LDB #10         ; Loop 10 times
loop:   ; ... your code ...
        DECB
        BNE loop        ; Loop back while B != 0

Or using DBcc — there's no DJNZ like the Z80, but DECB+BNE is just as clean.

Subroutines and the stack

        JSR  my_sub     ; Call subroutine (pushes PC on S stack)
        ; returns here

my_sub: ; ... do stuff ...
        RTS             ; Return (pulls PC from S stack)

The 6809 has two stacks: the hardware stack (S) used by JSR/RTS/interrupts, and the user stack (U) you can use freely. This is extremely useful — you can use U as a parameter stack without worrying about conflicts with JSR.

Pushing and pulling

PSHS A,B,X,Y    ; Push A, B, X, Y onto S stack (all in one instruction!)
PULS A,B,X,Y    ; Pull them back in reverse order

PSHU D,X        ; Push D and X onto U stack
PULU D,X        ; Pull from U stack

One PSHS instruction can push any combination of registers at once. This makes subroutine prologue/epilogue much cleaner than the 6502.

Addressing modes

The 6809's addressing modes are the most powerful of any 8-bit CPU:

Mode Example Meaning
Immediate LDA #42 The value 42
Direct LDA $40 Byte at DP:$40 (fast, 2 bytes)
Extended LDA $1234 Byte at $1234
Indexed LDA ,X Byte at X
Indexed+offset LDA 5,X Byte at X+5
Indexed post-inc LDA ,X+ Byte at X, then X++
Indexed pre-dec LDA ,-X X--, then byte at X
Indexed, D offset LDA D,X Byte at X+D (variable index!)
PC-relative LDA label,PCR Byte at label (position-independent)
Indirect LDA [,X] Byte at address pointed to by X

PC-relative addressing (PCR) is unique — it lets you write position-independent code that can be loaded anywhere in memory without relinking. Great for ROMs and relocatable routines.

D,X indexed is powerful — you can index into a table with a 16-bit offset computed at runtime, which makes sprite tables and lookup tables very natural.

The Direct Page register

The DP register works like the 6502's zero page concept, but it's moveable. Direct addressing (LDA $40) uses DP as the high byte of the address — so if DP=$20, LDA $40 accesses address $2040.

        LDA #$1F        ; Set DP to page $1F
        TFR A,DP
        SETDP $1F       ; Tell the assembler about it

        LDA $00         ; Accesses $1F00 — 2-byte instruction, fast
        LDA $FF         ; Accesses $1FFF

On power-up DP=$00, so direct page = $0000–$00FF (like the 6502's zero page). The Dragon 32 and CoCo keep DP=$00 for system variables.

Platform I/O overview

Dragon 32 / CoCo — memory-mapped I/O

; Dragon 32 — set border colour via SAM/VDG registers
; Video Display Generator at $FF00–$FF03
        LDA #$08        ; VDG mode register value
        STA $FF22       ; PIA 1-B data register — video mode bits

; CoCo — read joystick via PIA
        LDA $FF20       ; PIA 0-A: bit 7 = joystick comparator output

Vectrex — vector drawing

The Vectrex is unique — it draws vector graphics, not raster pixels. The 6809 talks to an AY-3-8912 sound chip for audio and a DAC + XY deflection system for drawing.

; Vectrex — draw a point at (0, 0) via BIOS
        JSR $F2EB       ; BIOS: Wait for beam to finish
        LDA #$00
        JSR $F3AE       ; BIOS: Draw dot at current position

The Vectrex BIOS provides high-level drawing routines — most Vectrex programs call these rather than programming the DAC directly.

Full minimal example — Dragon 32 colour bars

; Dragon 32 — cycle colours on screen using VDG register
; LWASM syntax

        ORG $7E00

start:
        LDA #$04        ; Colour set A
colour_loop:
        STA $FFB2       ; VDG colour register (approximate address)
        LDX #$FFFF      ; Delay loop
delay:  LEAX -1,X
        BNE delay
        INCA
        ANDA #$07       ; Keep in range 0-7
        BRA colour_loop

        END start

Common mistakes

Confusing A/B order in D — in the D register, A is the high byte and B is the low byte. So LDD #$1234 loads A=$12, B=$34. This trips people up when extracting bytes from a 16-bit value.

Direct vs extended addressingLDA $40 uses the direct page (2-byte instruction, fast), but LDA $0040 forces extended addressing (3-byte instruction, slower). The assembler usually picks the right one based on your SETDP hint, but be explicit if in doubt.

PSHS/PULS orderPSHS A,B always pushes in a fixed order (B first, then A, regardless of the order you write them). PULS A,B always pulls in the reverse fixed order. The registers are specified as a set, not an ordered list.

Forgetting SETDP — if you change DP with TFR, tell the assembler with SETDP or it'll generate wrong direct-page instructions.

Long branches — the short BEQ etc. only reach ±127 bytes. If you get a range error, prefix with LB: LBEQ.

See also