An HIRES tile engine for the Apple II

As I wrote in a previous post, programming a game engine in HIRES (”High Resolution”) is no small feat:

its memory layout implies that contiguous lines on display are not contiguous in memory,
there are some ”holes” in the memory section,
individual pixels are written in reverse order inside a byte
a byte describes 3,5 pixels

Thus, programming a proper scrolling is more complicated than just copying and masking blocks of memory!

But what about a tile display engine? The screen would then be a grid filled with tiles. The screen and the game elements would not smoothly scroll and move, but would jump from tile to tile.

Two fine examples are Boulder Dash and Ultima V.

And here is the result of the engine described in this post (Work in Progress).

This post will only discuss the displaying aspects of a tile engine. Collision detections, how to update and interact with the world will be topics for further posts in the future.

Also the source code is available on this public repository, and will be updated for each blog post about this subject.

Fitting in the HIRES constraints

The requirements for such an engine are:

Be fast to refresh the screen
Be able to display a tile from a library in any location of the grid
Be fast!!

In order to achieve speed and simplicity, we need to design the tile engine around the HIRES peculiarities and not against them.

Thus,

The tiles width shall be a multiple of 7 (two bytes).Indeed, one byte in HIRES memory stores 3.5 pixels.
Displaying tiles will then be a matter of copying bytes, whatever their content may be. Plus, this is a convenient way to avoid ”color clashing”, which may occur inside these blocks of 7 pixels.
The tiles height shall be a multiple or a divider of 8.
Indeed, the address of a byte (a group of 3,5 pixels) in a line is usually the address of the same byte in the previous line plus 0x400 ; making a routine copying a tile line by line rather simple. Except that, I used the word usually! Lines are layed-out in memory by group of 8. In such a group, going from a line to the next means adding 0x400. But going from the last line of a group to the first line of the next implies subtracting 0x1F80. It is not infeasible, but if you want your engine to be fast, you don’t want to test every line for how to compute the address of the next one.
In order to bring predictability, you will have to chose the tiles’ height in relation with 8.

If your tiles respect those constraints, displaying them will only be a matter of (mostly) linear copy. And linear copies are the fastest!

High level view of the engine

The main purpose of our engine is to represent and display a tiled ”world”. As we want to move around this world we will display on the screen a portion of the world that we call a ”view”. This view will be updated by keyboard inputs.

Each time we move, we will read the world map and extract the portion of tiles to be displayed in order to construct our view. Then we will iterate over each tile and display it on the screen.

We thus have three important functions to write:

set_view_coords(x, y)
will compute the new location of the view and build an array of the tiles to be displayed.
refresh_display()
will iterate over the array of tiles forming the view and display them. It is also a good occasion to only refresh the tiles that were modified since the last call to refresh_display. An easy way to spare many cycles 😉.
display_tile(tile, x, y)
will display the given tile to the given coordinates. Those are not in pixels, but in tiles: if you have 14x16 pixel tiles, x=2 and y=1 means displaying at (28,16) in pixel coordinates. It will be called, en masse, from refresh_display.

And don’t forget the game_loop, which will wait for a key press and call set_view_coords then refresh_display accordingly.

game_loop:

    ldx Player_XY
    ldy Player_XY+1

    jsr world_set_player    ; setting player's location
    jsr set_view_coords     ; centering the view on the player
    jsr view_refresh        ; refreshing the screen. Calls display_tile for each updated tile

    ; waiting for a key to be pressed
    kbd_loop:
    lda KEYBD
    cmp #$80 ; bit #8 is set when a character is present
    bcc kbd_loop
    sta KEYBD_STROBE

    jsr key_action  ; action associated with the key pressed
    bvc kbd_loop

rts

On this code, we also have a routine named world_set_player. For now, we just have to know that it is used to place the player’s tile somewhere in the world and that we center the view using the very same coordinates.

If we want our world to become “dynamic”, we will have to also update the world itself from the game loop. But it is a story to be told in another post!

Low level implementation and optimizations

The critical routine is display_tile. Indeed, if the view sports a grid of 10×10 (using 14×16 pixel tiles), it will be called a hundred of times. In fine, it is up to it to copy enough bytes to fill the HIRES display.

Tip #1: Redraw only what is necessary

The first optimization is obvious: redraw only what is necessary ; i.e. redraw only the tiles that were modified. As we are programming for an Apple II, a clever but somewhat primitive machine even to 8-bit standards, we know that the screen will be composed of many tiles of the same type. If they are layed-out contiguously, a one tile-movement won’t change many of them.

A call to set_view_coords will construct a view, the collection of tiles to be displayed.
Inside refresh_display, we will compare this view to one actually on screen and will only call display_tile for those which have to change. And of course, we will update the actual view for the next time.

Here is the code snippet to build the view to display by copying the relevant tile numbers from the array describing the world to VIEW_FUTURE, an array collecting the tile numbers to be displayed:

; 1 - build the view to be displayed
    lda #0
    tax
loop_build_view_y:
        ldy #0
        loop_build_view_x:
            lda (VIEW_WORLD),Y
            sta VIEW_FUTURE, X
            inx
            iny
            tya
            cmp #GRID_WIDTH
            bne loop_build_view_x
        
        ; incrementing source address (otherwise offset Y would overflow)
        clc
        lda VIEW_WORLD  
        adc #WIDTH_WORLD
        sta VIEW_WORLD
        lda VIEW_WORLD+1
        adc #0
        sta VIEW_WORLD+1
        
        txa
        cmp #(GRID_HEIGHT*GRID_WIDTH)
        bne loop_build_view_y

And here is the comparison between the tiles from VIEW_FUTURE and those from VIEW_CURRENT (the tiles actually being on display). If the tile at a particular location has to be updated, we call display_tile. Otherwise, we branch afterwards and prepare for the next tile.

loop_display_tiles_x:
            lda VIEW_FUTURE, X
            cmp VIEW_CURRENT, X
            beq no_display      ; do not display an unchanged tile
            sta VIEW_CURRENT, X ; update the list of diplayed tiles
            sta TILE_NR
            stx SAVE_X
            jsr set_tile        ; this routines does not alter its parameters
            ldx SAVE_X
            inc DBG_NB_REDRAW   ; DEBUG
            no_display:            
            inx     
            inc TILE_COORD_X
            ldy TILE_COORD_X
            tya
            cmp #GRID_WIDTH
            bne loop_display_tiles_x

Measures on the actual engine show that we can expect to spare 50-75% of the redrawing, most of the time. The numbers printed on the video embedded above are the number of tiles that were redrawn after each movement, on a total of 100.

Tip#2 : Make good use of indexed addressing

Concerning indexed addressing

Although cruder than its contemporaries, the 6502 allows a practical yet efficient access to memory. Indeed, its X and Y register where designed to facilitate it: they aren’t call Index Registers for nothing 🙂! We will need to make good use of them if we hope to be a bit efficient.

Indexed addressing allows to read/write the byte from/to an address to which we add the content of an address register. Two variants exist:

Indexed Addressing

  lda $800, X ; X is added to 0x800 before loading the value

Indirect Indexed Addressing

  lda ($80), Y  ; Y is added to the 16 bit address stored at 0x80 and 0x81
                ; This give the address which content will be loaded to A.

The strength of these two addressing modes is that they allow to easily iterate over an array, starting from a base address. The fact that the 6502 sports single byte instructions to quickly increment and decrement X and Y also helps a lot!

There is a big limitation, though. As we can see, Indirect Indexed Addressing is the most flexible variant: we can indeed compute or read the base address and store it. On the contrary, the address for Indexed Addressing is drawn from the code itself and fixed. Unfortunately, X and Y are not interchangeable here: only Y can be used for Indirect Indexed Addressing!

How to code it

Displaying a tile is only a matter of copying bytes into the HIRES address range. We need to read the bytes from the tile to be displayed, then write them to the addresses corresponding to the tile location on the screen.

Due to the exotic layout of the HIRES memory, writing is not trivial. We will read the 64 bytes composing our tile linearly, but use a more complicated pattern to write them.

First, we must find the address corresponding to the tile on screen ADDR_DST_BLOCK. All those addresses are stored in a table called HGR_GRID. The first displayed line of the tile will be laid at ADDR_DST_BLOCK, the second at ADDR_DST_BLOCK+0x400, the third at ADDR_DST_BLOCK+0x800, and so on until we reach the 9^th line. This one begins at ADDR_DST_BLOCK+0x80, the folowing at ADDR_DST_BLOCK+0x480 and so on.

A line being four byte long, the fastest way to address the destination is Indirect Indexed Addressing, because it is evident that ADDR_DST_BLOCK will be modified many times.
Thus we could do:

lda ADDR_SRC_BLOCK, X
sta (ADDR_DST_BLOCK), Y
inx
iny

and repeat four times. Then we would add 0x400 and redo the same seven more times, which would complete the first height lines. For the remaining height, we would repeat the exact same code, but starting at ADDR_SRC, BLOCK+0x80.

But there is a big catch! Indexed Addressing is used to perform the loads. But the base address has to be a constant! How then can we hope to display different tiles? Modifying the starting value of X cannot be a solution, as the memory amount required to store all the tiles is much bigger than 256 bytes (X is an 8 bit register).

The solution will seem exotic nowadays, but it was quite common in the early 80’s: self-modifying code! The source address is a constant, but every bit of memory is modifiable during run-time. All we have to do is to modify the code itself to “patch” the source address as soon as it is known.

Et voilà!

In the code of the set_tile routine, the source address has to be patched in many locations. We do it as soon as possible.

Retrieve the tile’s address from the TILE array
Patch the code (ADDR_TO_PATCH)
Retrieve the destination address (a.k.a tile’s base address in HIRES memory)
Copy the bytes

set_tile:
    .define ADDR_TO_PATCH  $666 ; 2 byte address to be patched by tile's address
    ; A tile being 16 line tall, it will vertically spawn on two 8 line "blocks"
    .define ADDR_DST_BLOCK_1  ZERO_2_1   ; first block
    .define ADDR_DST_BLOCK_2  ZERO_2_3   ; second bloc

    ; 1 - patching the code with the adress
    ; of the tile to be displayed
    lda TILE_NR
    asl    ; tiles' address are 2 byte wide
    tax

    lda TILES, X
    sta PATCH_1+1
    sta PATCH_2+1
    sta PATCH_3+1
    sta PATCH_4+1
    sta PATCH_5+1
    sta PATCH_6+1
    sta PATCH_7+1
    sta PATCH_8+1

    lda TILES+1, X
    sta PATCH_1+2
    sta PATCH_2+2
    sta PATCH_3+2
    sta PATCH_4+2
    sta PATCH_5+2
    sta PATCH_6+2
    sta PATCH_7+2
    sta PATCH_8+2

    ; destination address (HGR)
    ; 2 - get the offset from HGR_GRID
    lda #GRID_WIDTH
    sta FAC1
    lda TILE_COORD_Y
    sta FAC2
    jsr mul8                ; X = GRID_WITH * Y (always &lt; 0xFF)
    txa
    clc
    adc TILE_COORD_X        ; Won't set the carry
    asl                     ; 16 bit elements: doubling the offset
    tay                     ; Y: offset to get the address
    ; 3 - retrieve the destination address
    lda HGR_GRID, Y
    sta ADDR_DST_BLOCK_1
    adc #$80
    sta ADDR_DST_BLOCK_2
    lda HGR_GRID+1, Y
    sta ADDR_DST_BLOCK_1+1    
    sta ADDR_DST_BLOCK_2+1
   
    ; 4 - Draw
    ldx #0              ; loop counter & index source
    .define NB_ITER_1 #$20
    ; First loop: draw lines 1 to 8
loop_lines_1to8:
    ldy #0              ; index destination
    ; copy lines (4 blocks)
PATCH_1:    
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_1), Y
    iny
    inx
PATCH_2:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_1), Y
    iny
    inx
PATCH_3:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_1), Y
    iny
    inx
PATCH_4:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_1), Y
    iny
    inx
    ; next line
    lda ADDR_DST_BLOCK_1+1
    ADC #$4             ; addr += 0x400 
    sta ADDR_DST_BLOCK_1+1
    cpx NB_ITER_1
    bne loop_lines_1to8

    clc                 ; cpx affects carry

    .define NB_ITER_2 #$40
    ; Second loop: draw lines 9 to 16
loop_lines_9to16:
    ldy #0              ; index destination
    ; copy lines (4 blocks)
_DISP_TILE_2:
PATCH_5:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_2), Y
    iny
    inx
PATCH_6:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_2), Y
    iny
    inx
PATCH_7:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_2), Y
    iny
    inx
PATCH_8:
    lda ADDR_TO_PATCH, X
    sta (ADDR_DST_BLOCK_2), Y
    iny
    inx
    ; next line
    lda ADDR_DST_BLOCK_2+1
    ADC #$4             ; addr += 0x400 
    sta ADDR_DST_BLOCK_2+1
    cpx NB_ITER_2
    bne loop_lines_9to16

    rts

This code could be much more compact. But we are not memory constrained (yet), so it is sufficient for now.

Stay tuned for additions and enhancement in the weeks to come!