As I wrote in a previous post, programming a game engine in HIRES (”High Resolution”) is no small feat:

  • its memory layout implies that contiguous lines on display are not contiguous in memory,
  • there are some ”holes” in the memory section,
  • individual pixels are written in reverse order inside a byte
  • a byte describes 3,5 pixels

Thus, programming a proper scrolling is more complicated than just copying and masking blocks of memory!

But what about a tile display engine? The screen would then be a grid filled with tiles. The screen and the game elements would not smoothly scroll and move, but would jump from tile to tile. 

Two fine examples are Boulder Dash and Ultima V.

 And here is the result of the engine described in this post (Work in Progress).

This post will only discuss the displaying aspects of a tile engine. Collision detections, how to update and interact with the world will be topics for further posts in the future.

Also the source code will be posted on my Github at some point.

Fitting in the HIRES constraints

 The requirements for such an engine are:

  • Be fast to refresh the screen
  • Be able to display a tile from a library in any location of the grid
  • Be fast!!

In order to achieve speed and simplicity, we need to design the tile engine around the HIRES peculiarities and not against them.

Thus,

  • The tiles width shall be a multiple of 7 (two bytes). Indeed, one byte in HIRES memory stores 3.5 pixels.
    Displaying tiles will then be a matter of copying bytes, whatever their content may be. Plus, this is a convenient way to avoid ”color clashing”, which may occur inside these blocks of 7 pixels.
  • The tiles height shall be a multiple or a divider of 8.
    Indeed, the address of a byte (a group of 3,5 pixels) in a line is usually the address of the same byte in the previous line plus 0x400 ; making a routine copying a tile line by line rather simple. Except that, I used the word usually! Lines are layed-out in memory by group of 8. In such a group, going from a line to the next means adding 0x400. But going from the last line of a group to the first line of the next implies subtracting 0x1F80. It is not infeasible, but if you want your engine to be fast, you don’t want to test every line for how to compute the address of the next one.
    In order to bring predictability, you will have to chose the tiles’ height in relation with 8.

If your tiles respect those constraints, displaying them will only be a matter of (mostly) linear copy. And linear copies are the fastest!

High level view of the engine

The main purpose of our engine is to represent and display a tiled ”world”. As we want to move around this world we will display on the screen a portion of the world that we call a ”view”. This view will be updated by keyboard inputs.

Each time we move, we will read the world map and extract the portion of tiles to be displayed in order to construct our view. Then we will iterate over each tile and display it on the screen.

We thus have three important functions to write:

  • set_view_coords(x, y)
    will compute the new location of the view and build an array of the tiles to be displayed
  • refresh_display()
    will iterate over the array of tiles forming the view and display them. It is also a good occasion to only refresh the tiles that were modified since the last call to refresh_display. An easy way to spare many cycles 😉
  • display_tile(tile, x, y)
    will display the given tile to the given coordinates. Those are not in pixels, but in tiles: if you have 14*16 pixel tiles, x=2 and y=1 means displaying at (28,16) in pixel coordinates. It will be called, en masse, from refresh_display.

 And don’t forget the game_loop, which will wait for a key press and call set_view_coords then refresh_display accordingly. 

On this code, we also have a routine named world_set_player. For now, we just have to know that it is used to place the player’s tile somewhere in the world and that we center the view using the very same coordinates.

If we want our world to become “dynamic”, we will have to also update the world itself from the game loop. But it is a story to be told in another post!

 Low level implementation and optimizations

The critical routine is display_tile. Indeed, if the view sports a grid of 10×10 (using 14×16 pixel tiles), it will be called a hundred of times. In fine, it is up to it to copy enough bytes to fill the HIRES display.

Tip #1: Redraw only what is necessary

The first optimization is obvious: redraw only what is necessary ; i.e. redraw only the tiles that were modified. As we are programming for an Apple II, a clever but somewhat primitive machine even to 8-bit standards, we know that the screen will be composed of many tiles of the same type. If they are layed-out contiguously, a one tile-movement won’t change many of them.

A call to set_view_coords will construct a view, the collection of tiles to be displayed.
Inside refresh_display, we will compare this view to one actually on screen and will only call display_tile for those which have to change. And of course, we will update the actual view for the next time.

Here is the code snippet to build the view to display by copying the relevant tile numbers from the array describing the world to VIEW_FUTURE, an array collecting the tile numbers to be displayed:

And here is the comparison between the tiles from VIEW_FUTURE and those from VIEW_CURRENT (the tiles actually being on display). If the tile at a particular location has to be updated, we call display_tile. Otherwise, we branch afterwards and prepare for the next tile.

Measures on the actual engine show that we can expect to spare 50-75% of the redrawing, most of the time. The numbers printed on the video embedded above are the number of tiles that were redrawn after each movement, on a total of 100. 

Tip#2 : Make good use of indexed addressing

Concerning indexed addressing

Although cruder than its contemporaries, the 6502 allows a practical yet efficient access to memory. Indeed, its X and Y register where designed to facilitate it: they aren’t call Index Registers for nothing 🙂 ! We will need to make good use of them if we hope to be a bit efficient.

Indexed addressing allows to read/write the byte from/to an address to which we add the content of an address register. Two variants exist:

  1. Indexed Addressing

  2. Indirect Indexed Addressing

The strength of these two addressing modes is that they allow to easily iterate over an array, starting  from a base address. The fact that the 6502 sports single byte instructions to quickly increment and decrement X and Y also helps a lot!

There is a big limitation, though. As we can see, Indirect Indexed Addressing is the most flexible variant: we can indeed compute or read the base address and store it. On the contrary, the address for Indexed Addressing is drawn from the code itself and fixed. Unfortunately, X and Y are not interchangeable here: only Y can be used for Indirect Indexed Addressing!

How to code it

Displaying a tile is only a matter of copying bytes into the HIRES address range. We need to read the bytes from the tile to be displayed, then write them to the addresses corresponding to the tile location on the screen.

Due to the exotic layout of the HIRES memory, writing is not trivial. We will read the 64 bytes composing our tile linearly, but use a more complicated pattern to write them.

First, we must find the address corresponding to the tile on screen ADDR_DST_BLOCK. All those addresses are stored in a table called HGR_GRID. The first displayed line of the tile will be laid at  ADDR_DST_BLOCK, the second at  ADDR_DST_BLOCK+0x400, the third at  ADDR_DST_BLOCK+0x800, and so on until we reach the 9th line. This one begins at  ADDR_DST_BLOCK+0x80, the folowing at  ADDR_DST_BLOCK+0x480 and so on.

A line being four byte long, the fastest way to address the destination is Indirect Indexed Addressing, because it is evident that ADDR_DST_BLOCK will be modified many times.
Thus we could do:
 

and repeat four times. Then we would add 0x400 and redo the same seven more times, which would complete the first height lines. For the remaining height, we would repeat the exact same code, but starting at ADDR_SRC, BLOCK+0x80.

But there is a big catch! Indexed Addressing is used to perform the loads. But the base address has to be a constant! How then can we hope to display different tiles? Modifying the starting value of X cannot be a solution, as the memory amount required to store all the tiles is much bigger than 256 bytes (X is an 8 bit register).

The solution will seem exotic nowadays, but it was quite common in the early 80’s: self-modifying code! The source address is a constant, but every bit of memory is modifiable during run-time. All we have to do is to modify the code itself to “patch” the source address as soon as it is known.

 Et voilà!

In the code of the set_tile routine, the source address has to be patched in many locations. We do it as soon as possible.

  1. Retrieve the tile’s address from the TILE array
  2. Patch the code (ADDR_TO_PATCH)
  3. Retrieve the destination address (a.k.a tile’s base address in HIRES memory)
  4. Copy the bytes

This code could be much more compact. But we are not memory constrained (yet), so it is sufficient for now.

Stay tuned for additions and enhancement in the weeks to come!


2 Comments

Antoine · November 2, 2018 at 13:18

Nice article!
Save cycles with:
kbd_loop
lda kbd
bpl kbd_loop
sta kbdstrobe

    XtoF · November 2, 2018 at 16:34

    Thanks for this advice! I must admit that I was not very sure on how to use BMI and BPL branch instructions. Now I know!

Leave a Reply

Your email address will not be published. Required fields are marked *