Category Archives: Retrocomputing

A Game of Life

My port of Conway’s Game of Life to the Apple II is now complete!

The whole code is available on GitHub.
Plus, there are four posts on this documenting my progression that could interest those who want to learn more about programming on the Apple II,

  1. Coding in C for an 8 bit 6502 CPU
  2. Coding in Assembly for an Apple II
  3. HIRES Graphics on the Apple II
  4. Making the Apple II sing

You’re welcome to try it in the emulator embedded below!

This Javascript emulator presents a few caveats though.

  • It’s very, very slow, and won’t run real time on all but the most powerful smartphones
  • In order to « play » it past the Title Screen, a keyboard is required anyway
  • The loading time can last more than 30 seconds :/
  • The display is way cleaner and crisper than it would be on an actual machine

Try it

You can also download this disk image and load it on your favorite emulator for a more faithful result.

A few screenshots if you are on a smartphone and cannot run the emulator:

Lessons learnt

Now that this small project is completed, I can say that I have a good grasp on how the Apple II actually works and how to make use of its main capabilities. Of course, no breakthrough here. But my « A Game of Life » makes use of:

  • Keyboard inputs
  • Access to the file system via the PRODOS‘ API
  • Color graphics in both HIRES and LORES
  • Sounds, and even music

Here I will sum up some lessons I learnt, how things could be improved in an eventual future project and what surprised me the most.

Coding for an Apple II in 2017

As I explained in my first article on the subject, it’s quite possible to code in plain C, thanks to the amazing work of the guys behind CC65. The binary produced is far from being optimized but usable enough to form a skeleton or hosts a state machine. CC65 also comes with an assembler if some routines require to be optimized.

On the plus side

  • CC65 has a good compliance with the ANSI standard
  • CC65 comes with a good support library with some standard functions such as printf and others inspired from conio.h
  • There is a good integration with the assembler and the calling convention is well documented,

On the minus side

  • Most of the zero page addresses are reserved for the « runtime library ». Meaning that, even in assembly, you will be restricted to using only 8 addresses!
  • A lot of symbols are loaded by default, reducing the usable memory.

Testing and debugging

Using CC65, it is rather easy to develop and to cross-compile for the Apple II.

Fortunately, using an adequate emulator, it is also rather easy to debug code. Being mainly on Windows, I chose AppleWin. It is faithful to the hardware and, more important in this occasion, provides a debugger. If its debugging capabilities seem rather limited compared to a modern debugger, they are more than enough. And way more advanced that any debugger that can be found on the actual machine.

I was quite satisfied by AppleWin’s debugger and emulation!

The Apple II hardware

I learnt much about the hardware of the machine. I can sum up my feelings with two words: clever and primitive.

Indeed, everything about the Apple II was very cleverly engineered. But as the grandfather of the personal computers, it’s also way more primitive that what I expected.

The MOS 6502

The 6502 is a great microprocessor! Its instruction set is small and very lean. Quite easy to learn. For its time, it was quit an engineering achievement: as powerful as its main competitors of the time (the Intel 8080 and the Motorola 6800) for a fifth of the price! It was powerful enough to remain competitive during the first half of the eighties against its nemesis, the Zilog Z80.

But there is a catch. It comes with nothing superfluous.

For instance, there is, of course a LSR instruction (Logical Shift Right). But you can only shift by one bit. If you want to shift more? Use a loop! Furthermore, there is no LSL instruction! You can straight that out with the  ROR instruction anyway. Same with subtraction: if you want to subtract by a number, add its negative!

And it is a true 8-bit CPU. Meaning that only 8-bit operations are permitted!

On other 8-bit CPU such as the Z80, you can manipulate 16 bit numbers and use in them in some arithmetic operations. Of course, it is suboptimal. As the ALU is not 16-bit wide, it will rely on some slow micro-coded operations to perform a 16-bit task.

But micro-codes cost transistors. Thus the 6502 does not provide any!

When programming the 6502, you are constantly reminded how limiting 8-bit arithmetic is. The result of your additions cannot exceed 256. Your loops are limited to 256 iterations. You cannot event jump to an address more than 256 byte away in memory!

The Graphics

If I had no trouble using the text mode and the low resolution graphical mode, using the “HIRES” high resolution mode was a lot more difficult that what I expected. Of course, as I knew, there is no dedicated hardware to assist you. But the most disturbing is the peculiar way the data is organized in memory. Saying nothing about the artifacts appearing if you’re not careful.

This color display was astonishing for the time. But it was also very difficult to program for. I now measure the troubles that programmers of the time had to overcome in order to offer us some of the best arcade games ever!

The Sound

Bare-bone is what comes to my mind. I knew that the speaker had to be driven by the software but I did not expect the lack of an hardware counter or clock that could cadence it.

Conclusion

Programming a Game of Life on the Apple II was worth every bit of it. Using modern tools, it was not that difficult and was mainly a matter of reading (and understanding!) the documentation.

Now I wonder what will be my next move.
I still have to complete the restoration of my Apple II by fixing its Disk drive.
I could capitalize on my new knowledge of the 6502 to program a small game for the PC Engine (also known as the TurboGrafx-16 by our American friends). An impressive arcade machine!
But I’m also tempted to program something for the Amiga.

Stay tuned…

Making the Apple II sing

Now that my port of Conway’s Game of Life on Apple II possess a proper HIRES Title Screen, some kind of music would be great!

In this post, I will present the sound capabilities of the Apple II then explain how I coded a (simple) music library.

This is the fourth post about coding the Game of Life on the Apple II. Here are the previous ones:

  1. Coding in C for an 8 bit 6502 CPU
  2. Coding in Assembly for an Apple II
  3. HIRES Graphics on Apple II

And as usual, the code is available on GitHub!

Where in the world is the dedicated sound hardware?

Do you know the musical abilities of the Apple II? Well… there is none!

OK I am exaggerating a tiny bit. The CPU has direct control of a speaker, called the Beeper!

Most of the 8 bit home computers from the early eighties provided some hardware dedicated to producing pleasant music. Atari 8bits featured 4 voices on which you could play sines and mix some noise. The Commodore C64 came along the legendary SID, which the chiptuners still compose for!

But the Apple II precedes those machines by several years. It was not designed primarily as a game machine (although games were among its very first batch of programs). And, as we know, Woz liked nothing more than using standard chips and cutting their number to the bare minimum. That is why the Beeper is directly driven by the CPU.

Even the IBM PC, which was often derided for its appalling sound capabilities, provided more. In the Apple II’s fashion, the PC’s speaker is directly controlled by the CPU. But the PC also came with a Programmable Interval Timer, which could be used to cadence the program driving the speaker. On the Apple machine, you don’t enjoy such a facility. In order to timely cadence the speaker, you have to count cycles. So, if you don’t rely on very cleaver programming tricks, you will hog 100% of the CPU.

And that’s precisely what I am doing in the library I will expose below! 😉

Producing notes

One caveat: I am not a musician. I apologize in advance if my explanations concerning musical notation, pitch and octaves are a little bit approximate.

Principle

The only interaction a program can have with the Beeper is through a Soft-Switch. Reading or writing to the address 0xC030 will invert the position of the speaker. And that’s it.

By periodically alternating the high and the low positions of the speaker, this switch make it easy to produce a square waveform. The frequency of this vibration will command the musical note produced. Thus, the single most important thing is to control the timing on which I will activate the speaker’s soft-switch.

Unfortunately, as I said, on the Apple II there’s is no other easy way to measure time other than counting cycles. So, the heart of my “note” routine is composed of two loops.

  1. The inner loop controls the pitch of the note to play. It is composed of a number of NOPs, forming the smallest quantum of time for my function. The number of iterations is adjusted so that iterating the loop takes the time of half a period for the desired note. When that amount of time is elapsed, the soft-switch is activated.
  2. The outer loop controls the duration of the note. I want the base duration to be the same for all the notes. Thus, of course, the number of iterations will be different for each one, as their period depends on their pitch.

In order to calculate the number of those iteration to perform, I need to know precisely the duration of each instruction. Inside the loops of course, but also the control instructions: comparisons, conditional branching, etc…

Fortunately, the 6502 behaves predictably. Counting the cycles is only a matter of reading the documentation.

Controlling the pitch

These loops written, the real trick is to calculate the required number of iteration, depending on the pitch of the desired note. The formula is quite simple. The half period duration is \frac{1}{2 \ast Frequency_{NOTE}}. The duration of the half-period loop is easy to calculate. It’s \frac{Number_{cycles}}{Frequency_{CPU}}.

Thus, to get the number of required iterations: Nb_{ITERATIONS} = \frac{Frequency_{CPU}}{2 \ast Frequency_{NOTE} * Number_{CYCLES}}.

In order to help me, I wrote a small spreadsheet to perform the calculus for all the playable notes. Given that the 6502 is an 8 bit CPU, the highest number of iteration is 255. Thus, the the lowest note reproducible by my library is D#3, with 251 iterations.

You can download this spreadsheet from GitHub.

Du to the various approximations and the duration of the inner loop, this method cannot be 100% accurate. Thus, I recorded the notes using Audacity and measured their precise duration, so I could plot their relative distance from the goal.

Relative error of the reproduced frequency

Not that bad 😉

At the highest frequencies, less iterations are required to last half a period. Thus, the imprecision raises. I decided that the highest note reproducible would be B4. After that, the imprecision in the reproduction would be so high that those notes would be impracticable to play.

Controlling the duration

Concerning the duration of the note, it’s far easier. A base duration will be one second. It will correspond to a WHOLE note. As the inner loop is in charge for lasting half a period of the note, the number of outer loop’s iterations required is 2 \ast Frequency_{NOTE}.

This base duration will be shifted in order to play HALF notes, QUARTER notes, and so on.

Pauses

In order to differentiate the notes, silence is needed. So, I also wrote a “pause” function. Its principle is the same as the “note” function and its base duration is also one second. But of course, the speaker soft-switch is left untouched.

The result

With my library being able to produce musical notes, control their duration and insert an optional pause between notes, I can play a simple tune. I asked my friend Clint Slate to compose a short one, to be played during my Title Screen:

CLICK TO PLAY

And the same tune played by emulated Apple IIe:

CLICK TO PLAY

You can marvel to the characteristic beeps of the machine! And the tune is still quite recognizable! 🙂

More advanced techniques

While quite satisfied by the result, and also by the ease of use of my “Apple II music library“, I know it is possible to do a lot better!

First, my library use 100% of the CPU when playing something. It’s OK for my title screen, but it would be impossible to use it while achieving fluid animations. Good coders of the eighties could do it.

Well, maybe it is easier than playing music, as sound effects may not have to rely on strict timing requirements. But it impresses me nonetheless!

Another weakness of my library, is that it produces the crudest level of 1 bit music: square waves. While it was the quite common back in the time, some talented hackers could do a lot better and played music on more than one channel!

It’s even possible to play sampled sounds, using a technique called “Pulse-Width Modulation“.

Anyway, my music library now fulfills my needs!

HIRES Graphics on Apple II

The Apple II was innovative in many ways. One of its most remarkable feats was that it was the very first home computer supporting color graphics! Think about it! While all its competitor could display 40 columns of ASCII characters, the Apple II could displays graphics with up to 16 colors in LoRes mode and 6 colors in HiRes mode! This was huge!

Now that I know how to program my Apple II in C and ASM, I could not resist trying to display a high resolution image on my screen! So I added a Title Screen to my Game of Life 😉

This article is the third concerning my port of Conway’s Game of life on the Apple II. Here are the two previous ones:

  1. Coding in C for an 8 bit 6502 CPU
  2. Coding in Assembly for an Apple II

Graphics modes

Information can be display be the Apple II using different graphic modes.

  • TEXT mode allows to display 24 rows of 40 ASCII characters. With an expansion board, displaying 80 characters becomes possible.
  • LORES mode is 40 pixel wide, 48 pixel tall in 16 colors! While it seems quite low by today’s standards, it is  reasonably fast and straightforward to program for. This is the mode used to edit the “play-field”, then watch the cells evolve, in my port of the Game of Life.
  • HIRES is the mode used by most graphical programs and games of the time. It is 140 pixel wide, 192 pixel tall in 6 colors. BUT… relying on many oddities, it is not as easy to grasp nor program than the LORES mode. More precisions later.
  • MIXED mode allow to use the bottom of the screen of LORES and HIRES mode to display four lines of text.
  • Later Apple II models came with the so called DOUBLE LORES and the DOUBLE HIRES mode. The later was quite difficult to program for and heavily taxing memory. So it was mainly used for static screens.

From that point, I will be exclusively writing about HIRES.

HIRES Oddities

Color is achieved a very peculiar way on an Apple II. Furthermore, Woz made the choice to reduce the number of chips to the minimum. The price paid being that programming HIRES graphics on the Apple II is not as easy and fast that it could be on other machines.

Producing colors with black & white dots!

At its core, the Apple II is a black and white machine!
On the contrary to other 8 bit and 16 bit machines, you don’t select colors in a palette. Instead, you have to carefully place black and white dots. Indeed, the hardware was designed so that a rapid succession of such points introduces interferences into the color carrier of the signal sent to the NTSC TV.

Img1. Image seen on a black & white set

 

Img2. The same image on a color NTSC set

If you watch carefully the two images above you can notice:

  • That the text in the color version is not very crisp and is affected by some color fringes. Those are the “interferences” in action!
  • That the black & white version seems to be twice the resolution. And it is indeed! In order to achieve the 140 colored pixels of an HIRES line, you’ve got to draw 280 black and white dots!

Sending a succession of lit dots followed by unlit dots will draw color pixels, while two consecutive white dots will result in a white pixel and two consecutive black dots produce a black pixel. If you’re not already lost, wait, there is more!

The process I just describe only allows you to produce four colors: WHITE, BLACK, GREEN and PURPLE. So how can the system also produce BLUE and ORANGE?? By delaying the signal of half a dot!

Img3. Color generation on the Apple II

In memory

Pages

Two memory blocks can be read by the graphic circuitry of the machine in order to display a HIRES picture.

  • PAGE 1, from 0x2000 to 0x3FFF
  • PAGE 2, from 0x4000 to 0x5FFF

Layout

7 dot blocks

Each page allows the description of 192 lines of 140 pixels, or 280 dots. Those dots are grouped into lines of 40 blocks of 7 dots. Each of these dot is controlled by one bit: 0 produces an black dot, while 1 produces a white dot. Such a 7 dot block is described by one byte.
And the 8th bit? It controls the half-dot delay applied, thus the COLOR GROUP of the block.

  • 8th bit = 0 : GROUP 1, allowing to display GREEN, VIOLET, BLACK and WHITE
  • 8th bit = 1 : GROUP 2, allowing to display ORANGE, BLUE, BLACK and WHITE

This 8th bit is the higher bit of the byte.
The 7 other bits describe the dots in the following order: the lesser bit corresponds to the dot displayed on the left and the higher bit to the dot on the right.
Those bytes are grouped inside 40 block lines and addressed the following way: the byte of lowest address corresponds to the left-most block of one line, while the byte of highest address correspond to the block on the right.

Img4. Dots layout in a byte

If you followed carefully, you surely noticed that the layout of the bits inside one byte (7 dot block) is the reverse of the layout of the bytes inside a line! Legends tell that this insanity allowed Woz to save two chips…

Colors

If you want to produce color, you have to set the high bit accordingly to the colors you want to produce.
Then you have to lay the dots following these directives:

  • dots OFF ON gives GREEN (GROUP1) or ORANGE (GROUP2)
  • dots ON OFF gives VIOLET (GROUP1) or BLUE (GROUP2)
  • dots ON ON gives WHITE
  • dots OFF OFF gives BLACK

As 7 dots gives 3.5 colored pixels, you’ll often have to think about the layout of two consecutive blocks.

To sum those things up, here are the bytes required to obtain a line of 14 green pixels:

0x2A – 0x55 – 0x2A – 0x55

0 0 10 10 10 – 0 10 10 10 10 0 10 10 10 – 0 10 10 10 1

(the brown bits describe adjacent dots)

Clashing

Now, two limitations become evident:

  1. You cannot display a color from GROUP 1 and a color from GROUP 2 in the same 7 dot block.
  2. Two different adjacent colors may lead to the appearance of unwanted artifacts! Indeed if you want to display GREEN (dots: OFF ON) followed by VIOLET (dots: ON OFF), there will be two adjacent ON dots, leading to the appearance of WHITE!

Those artifacts were called clashing back in the days. Quoting Wikipedia:

Img5. The horizontal borders between two colors generate “fringe effects” on the Apple II. In the lower left image, drawing a blue star on a green background causes the Apple II to add black, white and orange pixels at and near the horizontal boundaries between the green and blue.

Interleaving

One more thing…

The lines are not contiguous in memory! Meaning line number 2’s address does not follow line number 1’s… Instead, they are ordered according to a 64 line interleaving pattern.

Below lies a table from the Apple’s documentation of the time.
On the left are the address of so-called “boxes” of height lines. On the right, in the zoomed view, you have the offset to be applied for each line inside a box. For instance, line #10 is the second line of the second box. Therefore, its address is 0X2480.
On the top are the offset of each 7 dot blocks inside a line. Therefore, the block #3 of line #10 lies at the address 0x2483.

Img6. Starting address of the ordered lines

Holes

And a last, the PAGES are 8192 byte long, but 192*40 = 7680! 512 bytes are missing!

Those bytes are 8 byte long holes located at the end of every line which address ends with 0x50 or 0xD0. In the eighties, code wizards did not left these holes unused! For instance, this memory could be used to host the code of a self displaying image!

Soft-switches

In order to display your image, you have to activate some soft-switches by reading or writing to the following addresses:

  • 0xC050 in order to switch to the graphical display
  • 0xC052 to be full-screen
  • 0xC057 to enter HIRES mode
  • 0xC054 to select PAGE 1 or 0xC055 to select PAGE 2

In order to display elaborate animations, you can work on a PAGE while the other one is displayed. Yes, the Apple II can perform double buffering!

Rgb2HiRes

Remember that I wanted to add a Title Screen to my Game of Life. But it is now apparent that producing a HIRES image is not a very straightforward task. Thus, I found it faster and easier to program a tool converting a modern RBG image (PNG, JPEG, etc) into a proper binary file that only need to be loaded in PAGE 1 or PAGE 2 in order to be displayed!

I called this program Rgb2Hires 😉

You can get it on Github.

Of course it’s not magical: you’ve got to respect the caveats exposed before.
In particular, your source image must be only composed of Black, White, Purple, Orange and Blue pixels.
And it has to be 140×192 pixels. Remember that HIRES pixels are anamorphic: they are not square, but twice wider than tall! For easy drawing, Photoshop or equivalent programs can be told to work on anamorphic pixels. Thus the image in the editor will appear with the same proportion that on the Apple II.

Img7. Game of Life’s Title Screen, PNG exported from Photoshop

 

Img8. Game of Life’s Title Screen, HIRES image in AppleWin

6502 Assembly language

Coding in Assembly for an Apple II

In my last post, I wrote how I tried to code a port of Conway’s Game of Life for my Apple II. This port was written in C and relied on a cross compiler suite (cc65), generating binaries for the Apple’s CPU : the MOS 6502. The conclusion was that the port was viable. But unfortunately, it was also way too slow. Indeed, the 6502 is a primitive 8 bit CPU that is not well suited to be targeted by a high level compiled language such as C.

It was time to speed things up and re-write the inner loops in assembly!

I will begin by presenting the results I obtained, before explaining the “workflow” I followed to produce binaries, upload then debug them.

The code can be cloned from my Github repository:

Results obtained

I concentrated my optimizing efforts on the main “updating” function. Indeed, it is responsible to determine which cell will die and where new ones will spawn, therefore is the most time-consuming part of the program.

  1. Rewriting the most inner function, “count_neighbours“, in assembly.
    • In my previous post, I noted that the code generated by the compiler was around 220 opcode long, which was huge.
      After my rewriting the function shortened to only 30 opcode long!
  2. Rewriting the C code of the outer loop in order to make it more 6502-friendly.
    • Using __fastcall__ calling convention, leading to the use of the accumulator and the X register to pass parameters instead of the “fake” stack.
    • Eliminating indirect addressing ( ie. [X][Y] ) which are specially expensive.
  3. Rewriting all the update function in assembly.

Incidentally, I also abandoned the ASCII display functions provided by the compiler suite, and wrote some assembly functions to replace them. I then abandoned the original text mode to draw the cells in color 😉

And here is the normalized execution time, for 30 iterations, of all these versions. Of course, the less, the better.

Normalized execution time of iterated versions

A 18 time speed-up! Writing assembly is not always easy, but it is definitely worth it!

The 6502 opcodes are few and “logical”, thus they are quite easy to learn and use. However, as I mentioned in my previous post, this CPU is primitive by today standards. Its resources are scarce and I have to admit that I was not used to 8 bit arithmetic: ceiling at 255 can be quite frustrating!
Especially when branching. The 6502 provides very efficient conditional branching instructions. The drawback is that you can only jump 128 bytes backwards and 127 bytes forward. As most instructions are 2 or 3 byte long, you have to code tight and carefully layout your instructions if you want to use them!

Remember that most of the code (all the logic, state machine, textual displays…) is still written in C. This remaining of code is less systematic and would be much more difficult to write in ASM. But it does not matter much, as the time spent in these parts is also much smaller.

To conclude, I would therefore say that nowadays it is not too difficult to write a program for Apple II computers. Thanks to cc65, you can write the whole program in C and, if you require more performance, rewrite the most time consuming function in ASM. And thanks to modern editors and emulators, the process of writing, testing and debugging is way more easy and pleasant than it was thirty years ago!

Quick workflow

When coding such a small project in C, I could live without a debugger. But such was not the case anymore when I started to rewrite some parts in assembly.

  1. I was not familiar with the 6502 architecture and opcodes, so it was unthinkable that I could produce bug free binaries at my first attempts
  2. I don’t know how to debug a low level program without a proper debugger

Thus I established my (quick and dirty) workflow.

The assembler

That’s the easy part, as the cc65 compiler suite provides a macro assembler simply called ca65. But I did not want to rewrite all my program, only the inner loops. Thus I had to learn the compiler’s calling convention, aka how to call a subroutine written in ASM from the C code. And return from it… Fortunately, this part is well documented.
The only caveat I could not overcome was to access the symbols declared in C. Thus, I introduced a function, init_asm, to pass their address to my “ASM world”.

Producing a disk image

The linker will produce a perfectly viable Apple II executable file, but  loading it on an emulator requires to embedded it on a disk image. For that purpose, I used AppleCommander. It is a utility (unfortunately written in java) which purpose is to manipulate Apple II’s disk images. And one bright spot is that it can be used from the command line, thus allowing to invoke it as a build step in the Makefile.

Command to add the executable to the disk image:

Command to remove the executable from the disk image:

The debugger

In order to test my program and debug it, I used an emulator: AppleWin. It is quite accurate, but most important, it also features a competent debugger! Of course, it is rougher to use than a modern one. But hey, if you’re coding in 6502 assembly that won’t stop you! 😉

At any moment, you can enter this debugger by pressing F7. It is then quite easy to place breakpoints, run until a specific address or inspect the memory. Unfortunately, after exiting the debugger the program often does not resume correctly. Thus I could not reliably place my breakpoints before launching my program :/ So I often had to place an infinite loop at the beginning of my code. When halted, I then entered the debugger to manually modify the program counter in order to exit this loop. Then I could ask the debugger to run until my area of interest.

AppleWin's integrated debugger

AppleWin’s integrated debugger

 

Coding in C for an 8 bit 6502 CPU

As my French speaking readers may know, I recently awoke my Apple IIe from its long hibernation and proceeded to some minor repairs to render it usable again.

There are many many interesting games running on Apple II, but I’m a coder so I want to run some of my own code on it! 😉

CC65, a compiler suite targeting the 6502

Traditionally, these 8bit personal computers were programmed in BASIC. Apple IIs were provided with AppleSoft, a quite powerful (for that time) BASIC from Microsoft. But BASICs are interpreted languages, thus are quite slow. So if you wanted to do something serious, you had no choice but programming in assembly.

The Apple II is powered by a very simple MOS 6502 CPU. But although I had my time programming in ASM (mainly for Motorola’s 68000 and 56000), I wanted a way to avoid plunging too deeply into the arcane of the 6502 architecture. So I was quite pleased to find that there exists an open-source cross compiler suite targeting the 6502! It even provide a limited support of the standard library on the Apple II and an easy way to read inputs and draw ASCII characters on the screen!

Armed with this powerful suite, I decided to quickly implement the Game of Life. This “Game” consists in placing so called “cells” on a board then watching them evolve or die, as they follow some basic rules.

Some cells evolving following the Game of Life's rules.

Some cells evolving following the Game of Life’s rules.

Compilation, was flawless. Putting the binary on a disk was not much of a hurdle either. But when I watch the cells evolving… It is SLOOOOOOOOOOOW ! 😮

Investigating the code

I know that the 6502 in my Apple is a 8 bit CPU running at a paltry 1MHz. But the most demanding part of my code is the following function, called 798 times per screen (or 38 times per line).

So we are talking about a grand total of around 10000 a 8 bit additions and 20000 8 bit memory access. That’s not negligible, but that should not take so long. Drawing a cell on the screen is only a matter of ms. So I know that my function is the main culprit.

I decided to have a look at the generated assembly file. It’s quite easy as C65 compiles the C into assembly, before assembling it to the binary object.

WTF ??!!??

As I said I’m not an expert in the 6502’s ISA, but more than 220 instructions, including many jumps to subroutines ??? Basic operations such as additions, and stack operations are performed by subroutines (addqysp and pusha0)????
Clearly there is something wrong. No wonder that my Game of Life runs so slowly !

I read the coding hints of the C65 documentation, but it appears that I did nothing too wrong. Plus, the CC65 compilation suite is praised, and is considered more effective than some C compiler of the 80s.

There must be something else.

And indeed, the culprit is the 6502 itself: its architecture is totally unsuited for high level programming !

The MOS 6502

The 6502 is indeed a very unusual beast. If we compare it to the Z80, a very common 8 bit CPU of that time,  the 6502 is even more a true 8 bit architecture: it does not provide any 16 bit register nor any support for 16 bit operations. And as a matter of fact, it only comes with a single “true” register! The Z80 provided eight 8 bit registers that could be combined to form four 16 bit registers! Ouch!

If we look at the 6502 block diagram below, we can see that its 8 bit register is called the Accumulator and can be source or destination to ALU and LOAD/STORE operations. There also two more “simili-registers”, X and Y, which are called Index Registers. Indeed, operating with them is much more limited. They are mainly used to store and produce indexes that will serve in some indirect addressing modes.

The 6502 block diagram with the area of interest highlighted

The 6502 block diagram with the areas of interest highlighted

Some more limitations:

  • The 6502 access data stored in 256 bytes pages. If you want to access any address higher than the “zero page” (0x0 to 0xFF) you’ll get a penalty as it requires to compute a 16 bit address !
  • The stack’s address is fixed to the “first page” (address 0x100 to 0x1FF) and thus cannot contain more than 256 bytes.
  • They is no multiply nor divide operation. As a matter of fact, no “complex” operation is supported as there is no microcode!
  • Only the Accumulator can be pushed or pulled (poped) from the stack.

I could continue this list.
I’m sure you begin to understand that, this is far from being C friendly.

In order to address these shortcomings, the compiler programmers had no choice other than to rely on inefficient solutions. For instance, C65 stores its own “unofficial” stack in the highest addresses and grows downwards. In order to use it, it has to rely on custom “push” and “pop” subroutines. It is because of the accumulation of such tricks that the generated code is so inefficient.

But anyway kudos to the C65 developers! It was not a small task to allow easy C programming to such a target !

Any strong point?

If the 6502 was so primitive, how comes that it had such a tremendous success in its time? You can find it the Apple II, but also in the NES, the C64, the BBC micro and many others!

Well, I only stated that is was unsuited for high level compiled languages but, during the 80s, home computers were not programmed like that!

When properly programmed in assembly, the 6502 can truly sing!

First, its small instruction set (only 56!) can be seen as a strength. Decoding them is fast and cheap. Furthermore, the execution time is small compared to the other architectures of the time : almost always one cycle (excluding memory access). Some kind of primitive pipelining is also possible when combining  some addressing mode with some operations: the next instruction can be fetched before the completion of the current one!

And the chip itself was cheap. Cheap to produce, thus cheap to buy. It is composed of only 3510 transistors!!! As a matter of fact, the 6502 is considered by many as the precursor of the RISC architectures. And it well known that it as inspired the designers of one of the most well known RISC CPU : ARM!

Finally, the 6502 access its memory faster than its contemporaries: in one cycle! Therefore, a programmer can use the zero page as a pool of 256 8 bit registers! The 6502’s designers were not crazy: their CPU lacks registers because it does not need any!

With these strengths the 6502 could compete with the other 8 bit CPUs, often clocked 2 to 4 times faster. It is a clean and elegant design, that is unfortunately ill-equipped for modern programming.

 

So, after all I’ll have to plunge into what I wanted to avoid: I’ll have to write my core routines in assembly language! Stay tuned! 😉