Assembler course part 1
by Wanja Gayk
translated by Kendra Thiemann
revised by Nate Dannenberg
Assembly. Machine Language. For many people these are still riddles without
solutions. But many experienced programers swear that Assembler is THE language
to use, not to mention being easier and more flexible than Basic. But why? Well,
first of all Assembler-instructions are very small and don't actually do much by
themselves. However, their combined effect can be quite impressive. To
understand these instructions, we first have to learn to count in binary and
hexadecimal, since most operations are easier to understand when expressed in
this form.
BITS AND BYTES
First, let's talk about bits. A bit is the smallest information unit and it can
have one of two states: set and clear, or set and reset if
you prefer. In other words, your choices are 1 and 0. Eight bits make one byte,
which means that with a little simple math, we find that there are 2 to the 8th
power, or 256, possible values.
HEXADECIMAL AND DECIMAL
When you count, you do so starting at zero or one, working your way up to ten.
Take a look at the following sequence of numbers:
0,1,2,3,4,5,6,7,8,9
You will notice that to represent any number, you need only one digit, until you
hit 10, at which point you need two digits to describe your number. Keep
counting and you eventually need to add even more digits as you reach 100, 1000
and so on.
In Hexadecimal (or "Hex"), it works a little differently. Consider the following
sequence:
0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F
In this sequence we have 16 values, ranging from 0 to F. But what do the letters
A through F mean? Well, since there are no single digit characters in the
English language that represent 10 through 15 in a natural way, we are forced to
choose letters of the alphabet, and so, A through F became the standard. To
further differentiate a decimal number from a Hex number, most coders use a
dollar sign '$' in front of a Hex number.
As with decimal, we eventually have to add another digit to our count as we
continue to increase in value. With decimal, we did this at 10. In Hex, we add a
digit when we hit 16.
In decimal, you could say that a number WXYZ is W*1000 + X*100 + Y*10 + Z. So,
the value 1234 would be 1000 + 200 + 30 + 4 of course.
In Hex, you would thus say that a number ABCD is A*4096 + B*256 + C*16 + D. As
an example, $1234 (Hex) would be 1*4096 + 2*256 + 4*16 + 4, or 4676, when
expressed in decimal.
In each case, each of the four digits in the numbers above is just that, a
single digit, either 0 through 9 for the decimal system, or 0
through F for Hexadecimal.
As mentioned above, one byte can hold any value from 0 to 255 ($00 to $FF). To
hold larger numbers, like $1234 in the above example, we simply use two bytes,
each storing two digits from the number. The byte representing the right most
digits is called the "LSB" or "Least Significant Byte", while the byte
representing the digits to the right is called the "MSB" or "Most Significant
Byte."
ASSEMBLY LANGUAGE - STEP BY STEP
THE REGISTERS:
In Assembly, your primary activity will be moving data back and forth,
manipulating bits and bytes, and making comparisons and jumps throughout your
program.
Inside every Commodore 64 is a MOS 6510 processor, the big brother to the 6502
that is used in the VIC-20 and most disk drives. This processor has three
registers that can be used for many purposes. They are, the accumulator, denoted
here as ".A", the X Index Register, denoted as ".X", and the Y Index Register,
denoted as ".Y".
Each register is one byte in size, hence each holds a value from $00 to $FF
(Hex).
THE MAIN STORAGE:
In addition to the registers, the 6510 has access
to 65536 bytes of User-Programmable memory. Of course, every Commodore 64 comes
fully loaded with a full 64K, which is enough to suit almost any need. Each
individual byte has an address within the range of $0002 to $FFFF, with the very
first two bytes taken by the processor for it's on-board parallel port.
THE FIRST COMMANDS:
Now it is time for you to load and start a machine language monitor. If you have
an Action-Replay, Final Cartridge, Action Gear, Nordic Power or anything
comparable, you can use the command MON to jump from the basic interpreter into
the Cartridge's internal Machine Language Monitor.
Now lets talk about the most important commands: LDA, STA and JMP.
LDA:
LDA is an abbreviation (Mnemonic) for Load Accumulator. We use LDA to load a
one-byte value into .A. The simplest LDA command is LDA #Value. As an example,
LDA #$01 flat-out loads the value 1 into .A,
LDA #$02 the value 2, and so on for any value $00 through $FF. Note that the "#"
sign is required, to specify "immediate" mode.
STA:
STA is the abbreviation for Store Accumulator. With STA we store the contents of
.A to someplace in main memory (or perhaps, into an I/O chip like the SID). The
contents of .A are left unchanged after the store operation. The simplest STA
command is STA Address. For example, STA $3000 would store the contents of .A
into location $3000 in main memory. STA $0400 would store to $0400, which is the
start of your 40 column display.
JMP:
JMP is the abbreviation for Jump. JMP is the 6510's "GOTO" command. Every
address of the main storage can contain data or programs. With JMP you order the
processor to stop what it's doing, move to a new place in your program or
perhaps into the Operating System, and begin executing. Normally, you write the
JMP command as JMP $nnnn where $nnnn is a location in the C64's main memory.
THE FIRST SMALL PROGRAM:
For our first small program we only need the mentioned 3 commands and 2
important storage locations you should keep in mind: $D020 and $D021. $D020 is
the control byte for the screen's border color, while $D021 controls the
background color of the text-portion of the screen.
And now, on to the ML Monitor. Below is a display typical of what you will see
when you start your ML Monitor (either with "mon", "m or shift-N" on the C128 in
C128 Native Mode, or by using a menu within your utility cartridge)
MON
B*
ADDR AR XR YR SP 01 NV-BDIZC
.; FFFF 00 00 00 F8 37 00000010
The first line means "Address" (where the computer was executing at when the
monitor was called), .A .X and .Y registers, Stack Pointer, the value of
location $0001, and the values of the 6510's various status flags (more on these
last three items later)
Let's try our first program. Try the following few lines of code. Enter each
line without the leading period (your ML Monitor will usually put it there for
you), and press return at the end of each line. Depending on the ML Monitor you
are using, the line may either be accepted as-is, corrected in some way, or
altered to include such information as the hexadecinal values that make up the
code you've entered.
As you enter each line, the ML Monitor will print the address of the next
instruction and position the cursor to the right of that address, sort of like
an "auto-line-number" feature.
.A 2000 LDA #$00
.A 2002 STA $D020
.A 2005 STA $D021
.A 2008 LDA #$01
.A 200A STA $D020
.A 200D STA $D021
.A 2010 JMP $2000
.A 2013 (just press Return)
What this program does:
2000 Load .A with the value $00
2002 Write contents of .A to the VIC chip's Border Color Register (#$00 was
loaded into .A on the previous line, so this turns the border black)
2005 Write contents of .A (still #$00) to the VIC chip's Background Color
Register. This turns the background black as well.
2008 Load .A with the value $01
200A Write contents of .A to the Border Color Register. Since we just loaded
#$01 into .A, the border will now turn white.
200D Write contents of .A to the Background Color Register (turns the background
white)
2010 Jump to memory location $2000 and continue. Since we are JMP'ing back to
the beginning of the program, we created an "infinite" loop.
You may run this program by entering the command G 2000 at the next available ML
Monitor prompt (if it produces one, usually a ".")
AND IN BASIC?
This is how the program might look if written in BASIC:
10 poke 53280,0
20 poke 53281,0
30 poke 53280,1
40 poke 53281,1
50 goto 10
In both cases we simply make the border and background colors flicker wildly
(black to white, over and over). You'll notice the BASIC version runs
considerable slower, as the screen will fill with stripes instead of thin,
broken lines.
The Basic-program does look smaller, doesn't it? Actually, it's larger. The
custom crafted machine code takes a mere 19 bytes of space (from $2000 to
$2012), while the BASIC version hogs a whopping 52 bytes! Part of the reason for
this is that the numbers 53280 and 53281 are actually being spelled out byte for
byte in the program, while the numbers $D020 and $D021 in our ML example are
being stored as binary numbers, taking only two bytes each.
In addition, BASIC is full of things like "line links" and line number values.
All of these generally make BASIC slow and bloated in comparison.
CONCLUSION
As you can see, Machine Language really isn't all that complex. Just as learning
to program in BASIC seemed complicated at first, you simply have to break the
ice and start with something small. Once you've gotten your feet wet, you'll see
it's really pretty easy to learn.
For those who want to get into Machine Language now, without waiting for future
articles and hints, at least start by picking up a pocket calculator that
features Hexadecimal and Binary conversion keys. Some calculators in the Casio
FX series feature these, and they are quite handy.