As my French speaking readers may know, I recently awoke my Apple IIe from its long hibernation and proceeded to some minor repairs to render it usable again.

There are many many interesting games running on Apple II, but I’m a coder so I want to run some of my own code on it! 😉

CC65, a compiler suite targeting the 6502

Traditionally, these 8bit personal computers were programmed in BASIC. Apple IIs were provided with AppleSoft, a quite powerful (for that time) BASIC from Microsoft. But BASICs are interpreted languages, thus are quite slow. So if you wanted to do something serious, you had no choice but programming in assembly.

The Apple II is powered by a very simple MOS 6502 CPU. But although I had my time programming in ASM (mainly for Motorola’s 68000 and 56000), I wanted a way to avoid plunging too deeply into the arcane of the 6502 architecture. So I was quite pleased to find that there exists an open-source cross compiler suite targeting the 6502! It even provides a limited support of the standard library on the Apple II and an easy way to read inputs and draw ASCII characters on the screen!

Armed with this powerful suite, I decided to quickly implement the Game of Life. This “Game” consists in placing so called “cells” on a board then watching them evolve or die, as they follow some basic rules.

Some cells evolving following the Game of Life's rules.

Some cells evolving following the Game of Life’s rules.

Compilation, was flawless. Putting the binary on a disk was not much of a hurdle either. But when I watch the cells evolving… It is SLOOOOOOOOOOOW ! 😮

Investigating the code

I know that the 6502 in my Apple is a 8 bit CPU running at a paltry 1 MHz. But the most demanding part of my code is the following function, called 798 times per screen (or 38 times per line).

So we are talking about a grand total of around 10000 a 8 bit additions and 20000 8 bit memory access. That’s not negligible, but that should not take so long. Drawing a cell on the screen is only a matter of ms. So I know that my function is the main culprit.

I decided to have a look at the generated assembly file. It’s quite easy as C65 compiles the C into assembly, before assembling it to the binary object.

WTF ??!!??

As I said I’m not an expert in the 6502’s ISA, but more than 220 instructions, including many jumps to subroutines ??? Basic operations such as additions, and stack operations are performed by subroutines (addqysp and pusha0)????
Clearly there is something wrong. No wonder that my Game of Life runs so slowly !

I read the coding hints of the CC65 documentation, but it appears that I did nothing too wrong. Plus, the CC65 compilation suite is praised, and is considered more effective than some C compiler of the 80s.

There must be something else.

And indeed, the culprit is the 6502 itself: its architecture is totally unsuited for high level programming !

The MOS 6502

The 6502 is indeed a very unusual beast. If we compare it to the Z80, a very common 8 bit CPU of that time,  the 6502 is even more a true 8 bit architecture: it does not provide any 16 bit register nor any support for 16 bit operations. And as a matter of fact, it only comes with a single “true” register! The Z80 provided eight 8 bit registers that could be combined to form four 16 bit registers! Ouch!

If we look at the 6502 block diagram below, we can see that its 8 bit register is called the Accumulator and can be source or destination to ALU and LOAD/STORE operations. There are also two more “simili-registers”, X and Y, which are called Index Registers. Indeed, operating with them is much more limited. They are mainly used to store and produce indexes that will serve in some indirect addressing modes.

The 6502 block diagram with the area of interest highlighted

The 6502 block diagram with the areas of interest highlighted

Some more limitations:

  • The 6502 accesses data stored in 256 bytes pages. If you want to access any address higher than the “zero page” (0x0 to 0xFF) you’ll get a penalty as it requires to compute a 16 bit address !
  • The stack’s address is fixed to the “first page” (address 0x100 to 0x1FF) and thus cannot contain more than 256 bytes.
  • There is no multiply nor divide operation. As a matter of fact, no “complex” operation is supported as there is no microcode!
  • Only the Accumulator can be pushed or pulled (poped) from the stack.

I could continue this list.
I’m sure you begin to understand that, this is far from being C friendly.

In order to address these shortcomings, the compiler programmers had no choice other than to rely on inefficient solutions. For instance, CC65 stores its own “unofficial” stack in the highest addresses and grows downwards. In order to use it, it has to rely on custom “push” and “pop” subroutines. It is because of the accumulation of such tricks that the generated code is so inefficient.

But anyway kudos to the C65 developers! It was not a small task to allow easy C programming to such a target !

Any strong point?

If the 6502 was so primitive, how comes that it had such a tremendous success in its time? You can find it in the Apple II, but also in the NES, the C64, the BBC micro and many others!

Well, I only stated that is was unsuited for high level compiled languages but, during the 80s, home computers were not programmed like that!

When properly programmed in assembly, the 6502 can truly sing!

First, its small instruction set (only 56!) can be seen as a strength. Decoding them is fast and cheap. Furthermore, the execution time is small compared to the other architectures of the time : almost always one cycle (excluding memory access). Some kind of primitive pipelining is also possible when combining  some addressing mode with some operations: the next instruction can be fetched before the completion of the current one!

And the chip itself was cheap. Cheap to produce, thus cheap to buy. It is composed of only 3510 transistors!!! As a matter of fact, the 6502 is considered by many as the precursor of the RISC architectures. And it is well known that it has inspired the designers of one of the most well known RISC CPU : ARM!

Finally, the 6502 accesses its memory faster than its contemporaries: in one cycle! Therefore, a programmer can use the zero page as a pool of 256 8 bit registers! The 6502’s designers were not crazy: their CPU lacks registers because it does not need any!

With these strengths the 6502 could compete with the other 8 bit CPUs, often clocked 2 to 4 times faster. It is a clean and elegant design, that is unfortunately ill-equipped for modern programming.


So, after all I’ll have to plunge into what I wanted to avoid: I’ll have to write my core routines in assembly language! Stay tuned! 😉


Nickolas · September 25, 2019 at 12:15

I saw 8-bit Guy’s Video about the 6502 hobby computer ( and he said the new 6502 chips (still manufactured) can be used with a clock of up to 17Mhz. This will probably allow you to run C without much speed penalties.

Very cool post and blog.

Gregory Casamento · September 25, 2019 at 12:36

No CPU is really BAD for high level development. Agreed the 6502 is ill suited, but it is possible. I believe that cc65 is simply not optimizing properly. See this…

    XtoF · September 25, 2019 at 21:55

    Ah yes I know this video. It’s quite impressive!
    Although working on embedded software, I am a big proponent of “modern” C++, and sometimes show it to people that are somewhat skeptical about the feasibility of programming micro-controllers in C++. Nevertheless, most of the magic in the video is achieved because many computations are done during compile time. Afterwards, the C64’s CPU does not have much to do 😉

    But you”re right, the binary produced by cc65 could be way more optimized. Especially concerning the loop I showcase.

polluks · April 12, 2020 at 15:55

How about this? And don’t forget the optimizer option.

typedef unsigned char uint8_t;
#define NEXT_LINE (40)

uint8_t count_neighbours( uint8_t* cell_ptr ) {
register uint8_t * cell = cell_ptr;
return cell[0] + cell[1] + cell[2]
+ cell[NEXT_LINE] + cell[NEXT_LINE+2]
+ cell[2*NEXT_LINE] + cell[2*NEXT_LINE+1] + cell[2*NEXT_LINE+2];

polluks · April 12, 2020 at 16:10

The coding guide says: “Use the array operator [] even for pointers”.
You break this rule 16 times.

    XtoF · April 16, 2020 at 23:10

    You’re right concerning this rule. It seems that I missed it at the time, my mistake. I may bench again using your proposal.

    But it won’t be much faster. The thing is that the compiled code is calling many many functions. In particular for the 16 bit arithmetic that is required to compute the addresses. As it follows a standard calling convention, it pushes the context and the arguments to a software stack as it cannot use the tiny one provided by the 6502. By the way “register” won’t help much as there is no register on the 6502 that could be used as storage.

    The point of this post is not that I wrote the best code possible, but that the 6502 is ill suited to be programmed in C.

Coding in Assembly for an Apple II – Some dev rants · October 9, 2016 at 10:27

[…] my last post, I wrote how I tried to code a port of Conway’s Game of Life for my Apple II. This port was […]

HiRes Graphics on Apple II – Some dev rants · December 15, 2016 at 01:04

[…] Coding in C for a 8 bit 6502 CPU […]

Making the Apple II sing – Some dev rants · January 21, 2017 at 01:56

[…] Coding in C for a 8 bit 6502 CPU […]

A Game of Life – Some dev rants · February 5, 2017 at 00:39

[…] Coding in C for an 8 bit 6502 CPU […]

Leave a Reply

Your email address will not be published. Required fields are marked *