Assembler course part 1

by Wanja Gayk

translated by Kendra Thiemann
revised by Nate Dannenberg


Assembly. Machine Language. For many people these are still riddles without solutions. But many experienced programers swear that Assembler is THE language to use, not to mention being easier and more flexible than Basic. But why? Well, first of all Assembler-instructions are very small and don't actually do much by themselves. However, their combined effect can be quite impressive. To understand these instructions, we first have to learn to count in binary and hexadecimal, since most operations are easier to understand when expressed in this form.

BITS AND BYTES

First, let's talk about bits. A bit is the smallest information unit and it can have one of two states: set and clear, or set and reset if
you prefer. In other words, your choices are 1 and 0. Eight bits make one byte, which means that with a little simple math, we find that there are 2 to the 8th power, or 256, possible values.

HEXADECIMAL AND DECIMAL

When you count, you do so starting at zero or one, working your way up to ten. Take a look at the following sequence of numbers:

0,1,2,3,4,5,6,7,8,9

You will notice that to represent any number, you need only one digit, until you hit 10, at which point you need two digits to describe your number. Keep counting and you eventually need to add even more digits as you reach 100, 1000 and so on.

In Hexadecimal (or "Hex"), it works a little differently. Consider the following sequence:

0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F

In this sequence we have 16 values, ranging from 0 to F. But what do the letters A through F mean? Well, since there are no single digit characters in the English language that represent 10 through 15 in a natural way, we are forced to choose letters of the alphabet, and so, A through F became the standard. To further differentiate a decimal number from a Hex number, most coders use a dollar sign '$' in front of a Hex number.

As with decimal, we eventually have to add another digit to our count as we continue to increase in value. With decimal, we did this at 10. In Hex, we add a digit when we hit 16.

In decimal, you could say that a number WXYZ is W*1000 + X*100 + Y*10 + Z. So, the value 1234 would be 1000 + 200 + 30 + 4 of course.

In Hex, you would thus say that a number ABCD is A*4096 + B*256 + C*16 + D. As an example, $1234 (Hex) would be 1*4096 + 2*256 + 4*16 + 4, or 4676, when expressed in decimal.

In each case, each of the four digits in the numbers above is just that, a single digit, either 0 through 9 for the decimal system, or 0
through F for Hexadecimal.

As mentioned above, one byte can hold any value from 0 to 255 ($00 to $FF). To hold larger numbers, like $1234 in the above example, we simply use two bytes, each storing two digits from the number. The byte representing the right most digits is called the "LSB" or "Least Significant Byte", while the byte representing the digits to the right is called the "MSB" or "Most Significant Byte."

ASSEMBLY LANGUAGE - STEP BY STEP

THE REGISTERS:

In Assembly, your primary activity will be moving data back and forth, manipulating bits and bytes, and making comparisons and jumps throughout your program.

Inside every Commodore 64 is a MOS 6510 processor, the big brother to the 6502 that is used in the VIC-20 and most disk drives. This processor has three registers that can be used for many purposes. They are, the accumulator, denoted here as ".A", the X Index Register, denoted as ".X", and the Y Index Register, denoted as ".Y".

Each register is one byte in size, hence each holds a value from $00 to $FF (Hex).

THE MAIN STORAGE:

In addition to the registers, the 6510 has access to 65536 bytes of User-Programmable memory. Of course, every Commodore 64 comes fully loaded with a full 64K, which is enough to suit almost any need. Each individual byte has an address within the range of $0002 to $FFFF, with the very first two bytes taken by the processor for it's on-board parallel port.

THE FIRST COMMANDS:

Now it is time for you to load and start a machine language monitor. If you have an Action-Replay, Final Cartridge, Action Gear, Nordic Power or anything comparable, you can use the command MON to jump from the basic interpreter into the Cartridge's internal Machine Language Monitor.

Now lets talk about the most important commands: LDA, STA and JMP.

LDA:

LDA is an abbreviation (Mnemonic) for Load Accumulator. We use LDA to load a one-byte value into .A. The simplest LDA command is LDA #Value. As an example, LDA #$01 flat-out loads the value 1 into .A,

LDA #$02 the value 2, and so on for any value $00 through $FF. Note that the "#" sign is required, to specify "immediate" mode.

STA:

STA is the abbreviation for Store Accumulator. With STA we store the contents of .A to someplace in main memory (or perhaps, into an I/O chip like the SID). The contents of .A are left unchanged after the store operation. The simplest STA command is STA Address. For example, STA $3000 would store the contents of .A into location $3000 in main memory. STA $0400 would store to $0400, which is the start of your 40 column display.

JMP:

JMP is the abbreviation for Jump. JMP is the 6510's "GOTO" command. Every address of the main storage can contain data or programs. With JMP you order the processor to stop what it's doing, move to a new place in your program or perhaps into the Operating System, and begin executing. Normally, you write the JMP command as JMP $nnnn where $nnnn is a location in the C64's main memory.

THE FIRST SMALL PROGRAM:

For our first small program we only need the mentioned 3 commands and 2 important storage locations you should keep in mind: $D020 and $D021. $D020 is the control byte for the screen's border color, while $D021 controls the background color of the text-portion of the screen.

And now, on to the ML Monitor. Below is a display typical of what you will see when you start your ML Monitor (either with "mon", "m or shift-N" on the C128 in C128 Native Mode, or by using a menu within your utility cartridge)

MON
B*
ADDR AR XR YR SP 01 NV-BDIZC
.; FFFF 00 00 00 F8 37 00000010

The first line means "Address" (where the computer was executing at when the monitor was called), .A .X and .Y registers, Stack Pointer, the value of location $0001, and the values of the 6510's various status flags (more on these last three items later)

Let's try our first program. Try the following few lines of code. Enter each line without the leading period (your ML Monitor will usually put it there for you), and press return at the end of each line. Depending on the ML Monitor you are using, the line may either be accepted as-is, corrected in some way, or altered to include such information as the hexadecinal values that make up the code you've entered.

As you enter each line, the ML Monitor will print the address of the next instruction and position the cursor to the right of that address, sort of like an "auto-line-number" feature.

.A 2000 LDA #$00

.A 2002 STA $D020

.A 2005 STA $D021

.A 2008 LDA #$01

.A 200A STA $D020

.A 200D STA $D021

.A 2010 JMP $2000

.A 2013 (just press Return)


What this program does:
2000 Load .A with the value $00
2002 Write contents of .A to the VIC chip's Border Color Register (#$00 was loaded into .A on the previous line, so this turns the border black)
2005 Write contents of .A (still #$00) to the VIC chip's Background Color Register. This turns the background black as well.
2008 Load .A with the value $01
200A Write contents of .A to the Border Color Register. Since we just loaded #$01 into .A, the border will now turn white.
200D Write contents of .A to the Background Color Register (turns the background white)
2010 Jump to memory location $2000 and continue. Since we are JMP'ing back to the beginning of the program, we created an "infinite" loop.

You may run this program by entering the command G 2000 at the next available ML Monitor prompt (if it produces one, usually a ".")

AND IN BASIC?

This is how the program might look if written in BASIC:

10 poke 53280,0
20 poke 53281,0
30 poke 53280,1
40 poke 53281,1
50 goto 10

In both cases we simply make the border and background colors flicker wildly (black to white, over and over). You'll notice the BASIC version runs considerable slower, as the screen will fill with stripes instead of thin, broken lines.

The Basic-program does look smaller, doesn't it? Actually, it's larger. The custom crafted machine code takes a mere 19 bytes of space (from $2000 to $2012), while the BASIC version hogs a whopping 52 bytes! Part of the reason for this is that the numbers 53280 and 53281 are actually being spelled out byte for byte in the program, while the numbers $D020 and $D021 in our ML example are being stored as binary numbers, taking only two bytes each.

In addition, BASIC is full of things like "line links" and line number values. All of these generally make BASIC slow and bloated in comparison.

CONCLUSION

As you can see, Machine Language really isn't all that complex. Just as learning to program in BASIC seemed complicated at first, you simply have to break the ice and start with something small. Once you've gotten your feet wet, you'll see it's really pretty easy to learn.

For those who want to get into Machine Language now, without waiting for future articles and hints, at least start by picking up a pocket calculator that features Hexadecimal and Binary conversion keys. Some calculators in the Casio FX series feature these, and they are quite handy.