BFP Chapter 01 : Warming up

Download sources for this chapter

Introduction

I started looking into Mega Drive development in early September 2015. My excuse for doing so was that I was going to take a class the same year, about microprocessors, in which we were going to learn 68K assembly, so I would check my code on a real system. It turns out we didn’t, but the excuse did its job and I’ve had great fun with this project since.

MegaDrive.png
I realize there are lots of tutorials out there but I’m not going to follow one. Instead I’ll just try to figure it out on my own. The reason being that I would like to re-create the feeling of coming across an unknown piece of hardware and figuring it out, much like the experience SEGA engineers had, when presented with the first prototypes of the console.

Please note that this is not structured as a tutorial series. It’s more like notes-to-self with an intent to be presentable (and hopefully helpful) to others. My purpose is not to just document what I’ve learned, but the series of events and thoughts that drove the learning process as well.

There’s no predefined roadmap and no goal in particular for this project, other than “Have great fun with the console”, so at the time of this writing I’m not sure about the outcome. The fact that you’re reading this though is a good indication that I’ve achieved something fancy. Otherwise I’m not posting these notes online.

So without further ado, let’s start!

The 68K

From wikipedia:
The main microprocessor of the Genesis is a 16/32-bit Motorola 68000 CPU clocked at 7.6 MHz.
M68K.jpg
So first of all, we need a way to write programs for the 68K – an assembler. I found EASy68K. It’s got a simulator and simulated output devices and would be perfect for our first small programs.

Now that we have an assembler, the next logical step is to find a good 68K assembly tutorial. The following seems like a good one (the author having done some Sonic the Hedgehog hacks was a plus):

MarkeyJester’s Motorola 68000 Beginner’s Tutorial

Random stuff I noted while reading:

  • OMG the guy is thorough, beginning from numbering systems. I’m going to skip to the actual code.
  • Memory Storage sub-tutorial seems to imply that 68K is big endian. Verifying … Yup! It is…
  • Basic instruction syntax: move.w src, dst
  • $FF is a hex address
  • #$FF is the immediate value ‘255’
  • d0 is a register
  • WTF? The 68K has 32-bit registers? I thought we were talking about the 16-bit era…
  • “Instruction sizes”: move.l (32 bit), move.w (16 bit), move.b (8 bit)
  • 16 (!) registers total: 8 data registers d0 to d7, 8 “address” registers a0 to a7
  • (a0) in parentheses dereferences the address register a0
  • $FF(a0) is adding offset $FF to a0
  • Haha! I’m beginning to like this! move.b    #$B5,(a0)+ will auto-increment a0
  • This also implies that indexing is in sizes and not in bytes. Ex: move.w d0, $FF(a0) will write to 255 WORDS after a0
  • Auto-decrement: -(a0). Obviously decrementing before storing as it’s more practical to go to one element after the last in a buffer and then go backwards (just adding the buffer length)
  • Reads and writes to memory must be aligned. Speaks only about odd adressing.

Out first lines of code

Now let’s start the assembler… the default program template is:

    ORG    $1000
START:                  ; first instruction of program
* Put program code here

    SIMHALT             ; halt simulator

* Put variables and constants here

    END    START        ; last line of source

Not sure yet if the directives are assembler-specific or some kind of convention but I’ll play nicely and do what the comments tell me. Before that however let’s find out what the ORG instruction does.

Checking the reference

Didn’t find anything in the docs. With a google search though I found out that ORG is not an instruction but a directive to the assembler. It simply specifies the offset at which the assembler must put the instructions that follow and this is indeed a convention for 68K assemblers. Also in the same page it said something about the first $400 bytes (the first kilobyte) being reserved for exception vectors so $1000 looks like a safe address for now.

Putting the following below the “Put program code here” comment:

move.l  $FFFFFFFF, d0

And I got my first addressing error!

Address Error: Instruction at 1000 accessing address ffffffff

Of course, $FFFFFFFF is not a literal but an address. I should have written #$FFFFFFFF instead. I’d really prefer it to be the other way around, but maybe they have a point. Maybe addresses occur more often than literals, we’ll see.

Fixing the bug and… Yes! d0 is FFFFFFFF from now on (ok, until I close the simulator)!

S-Records

skull.png

Edit: Don’t get crazy about SREC files as we’re going to ditch them soon, when we’ll start using an assembler that produces binaries.

Now that I’ve written my first program for the 68K, the next step is to find a way to run it on the Mega Drive. It’s not much, just setting one of the registers of the CPU of Mega Drive, but it is a program nevertheless.

The way I think about it now, I’ll have to convert the program to a binary (using the assembler), then convert the binary to a ROM, maybe by creating my own utility for the job and then find a Mega Drive emulator that supports stepping into the code and displaying the registers, load the ROM and watch d0 change value inside the emulator. Then become inexplicably happy about it.

So, I saved the program as “test00.x68”. The assembler outputs a “test00.s68” and a “test00.l68”, the latter being the output log with a fancy extension. The .s68 file is a text file with 4 lines of text:

S021000036384B50524F47202020323043524541544544204259204541535936384B6D
S10B1000223C0000FFFFFFFF8A
S1051006FFFFE6
S804001000EB

I can already tell that this is mostly hex numbers and that every line is a record of some kind (all starting with ‘S’). The assembler help file (which I had to unzip and look at its html contents, as for some reason the help file utility won’t open it) calls this file “S-Record” file, which seems like a good name.

I assume that my code is here in one of the lines, in machine language along with the offset to where the code is ($1000). I can already see the offset in 3 places and the only thing I know for sure about my instruction is the $FFFF literal which is in 3 places as well.

Let’s do a little experiment and change our instruction and also put it in a different address, and use numbers that stand out:

(showing only the relevant changes)

ORG    $AAAA
move.l #$BBBB, d0

And here is the S-Record output:

S021000036384B50524F47202020323043524541544544204259204541535936384B6D
S10BAAAA223C0000BBBBFFFFCE
S105AAB0FFFFA2
S80400AAAAA7

That’s more like it! Now I know my instruction is in line 2. The ‘1000’ in the first line was just a coincidence. Also my offset is in multiple places and line 3 has changed now containing something that resembles my offset. Other than that it’s a mess.

Wikipedia to the rescue!

So the SREC file consists of lines of text, each line being a record that starts with Sx, where x is a digit from 0 to 9.

The first line (S0) is an ASCII description of the program, the next 2 lines are data with their respective offsets and a checksum byte and the final line is the start address. This kind of file is more compact and more human-readable than a binary, which is nice.

The only thing I don’t understand is the purpose of line 3. If you look at line 2, my instruction is there at the correct offset. I speculate that the instruction ends at ‘BBBB’ (my literal number) and that ‘FFFF’ is a terminator of some kind. Now if you look at line 3, the SAME ‘FFFF’ appears again at its own (same) offset. So if we made a utility to write those records to a buffer, we would have written ‘FFFF’ twice at the same location. Weird.

Looking at the source, right below my instruction is a ‘SIMHALT’ directive. This of course can’t be in the 68K instruction set, as it refers to the simulator, so maybe repeating the final bytes twice is how the author of EASy68K tells the simulator to halt, without altering the contents of the program. Let’s test that. Remove the directive and re-assemble.

Haha! I love it when this happens! Removing the directive removes the repetition of ‘FFFF’. Actually it removes the ‘FFFF’ bytes altogether, so these must be the code for ‘SIMHALT’ that the simulator recognizes. Here is the new SREC:

S021000036384B50524F47202020323043524541544544204259204541535936384B6D
S109AAAA223C0000BBBBCE
S80400AAAAA7

Now, just for fun, let’s issue the SIMHALT twice and see what happens:

S021000036384B50524F47202020323043524541544544204259204541535936384B6D
S10BAAAA223C0000BBBBFFFFCE
S105AAB0FFFFA2
S105AAB4FFFF9E
S105AAB4FFFF9E
S80400AAAAA7

Yup, I expected that. Note that the second directive generated 2 extra records instead of one, as the first repetition must reset some state inside the parser, inside the simulator. But enough reverse engineering the simulator. Time to parse this SREC file, which is going to be our first step towards a ROM-creation utility.

Romtool

About 300 lines of C code later, I have a needlessly over-engineered program that parses *some* of the records (S0, S1 and S8, that are present in the above output). It also verifies the checksums. Here is an output of the program, called “romtool”:

S0 record: 68KPROG   20CREATED BY EASY68K
S1 record: Putting 8 bytes to address 0x1000
S1 record: Putting 2 bytes to address 0x1006
S8 record: Start Address: 0x1000

We now have the means to create a binary and we also know the start address, in case we need it. And it’s always good to know who has 20CREATED the program!

I don’t think EASy68K can be called from the console, which means we might have to look for another assembler in the near future. Getting in the IDE to click on “assemble” and then running the command-line tool to create our ROM will become boring soon. It’s ok for now though, as we have an easy way to test our first programs. I can pay a few extra clicks for that.

ROM format

Now it’s time to look at the ROM format. Google sent me to Zophar’s Domain. It’s not official but it’ll have to do since I can’t find any official documents. I found some technical documents that seem like a scanned version of the real deal here:

GENESIS Technical Overview 1995

But there’s no mention about the ROM, other than that it’s mapped in the first 4 megabytes of the address space ($000000 to $400000). That’s why I didn’t implement dynamic memory allocation in the romtool. We’re bound to 4 MB which we can safely allocate statically nowadays.

The “BIN” ROM file format is just a plain binary, as the name implies. I’ll try that first and see if I can find a decent emulator that supports it. As for the header, I’m going to simply just copy it from a working ROM as I’m not interested in setting custom info there. Let’s try to find a binary of the European version of Sonic The Hedgehog!Sonic.jpg

OMG! It’s just 512K! I’m impressed! They fit a whole world in there! There are textures today occupying orders of magnitude more space.

Actually we could easily display the entire ROM as a 1024×512 grayscale image! The point of this project is to have fun after all.

(Starting GIMP to see if I can import raw files)

SonicROMAsImage.jpg

There you have it! The combined effort of all the members of Sonic Team does not even cover your entire screen! In this image there is code, music, graphics, level design! When I first saw the file size of 512K I thought they might have padded the game to fit exactly that amount of memory, but no! The game was squeezed inside this little space, taking advantage of each and every byte! No wonder they left the “Sound Check” screen out. It was either that or the “SEGA” logo, a trivial decision.

To attempt speculation, I think I recognize some parts of the data. The one I’m almost certain about is the bottom stripe. This has to be waveform data for the “Seeeeegaaaa” voice. The values vary too smoothly for it to be anything else. So the sampled voice playing as the logo appears must be 8bit PCM (if it was 16bit the stripe wouldn’t look smooth at all). The other stripe I think I recognize is the Sega logo bitmap. I think I see the shapes of letters if I look at the third stripe (first stripe is black, then a thin white, then the third which is again black) with my peripheral vision. Might be wrong, I could have tried other alignments to be sure but we’re drifting away! Let’s go back to headers.

I’m taking the first 512 bytes of the Sonic ROM and putting them in a separate file called “header.bin”. EASy68K comes with a binary file utility that can do exactly that. 512 bytes must be the size of the header as the checksum starts counting from byte 512.

Regen

Next thing we need is the start address. Where does the Mega Drive expect to find its first instruction? I think I’ll find an emulator with debugging capabilities and try to step to the first instruction.
regen.png
I found Regen. In the feature list it mentions that Regen is an accurate emulator (couldn’t hurt) and also that it is based on a modified version of the “Musashi” 68K emulator, which I find awesome.

I got the version with the debugger. Loaded the ROM and tried to start the debugger as quickly as possible. I ended up far too deep inside the program so I clicked the “Reset” button on the debugger, which jumped at address $206.

It is more likely though that $200 is the starting address. After all it’s the point at which the checksum begins counting. Second clue: it has code there: two nop instructions followed by a bra $200. Third clue is that “clock cycles” is 1 and not zero. This must mean that we have already spent a cycle or else we have a programmer that prefers starting to count from 1, which is rather unlikely.

nopnopbra

Maybe this is some kind of hack protection/unauthorized ROM detection, an infinite loop that does nothing if the condition variable happens to be set (forgot to mention I googled this “bra” instruction, which is branch if condition).

Let’s test this theory. I’m going to put an invalid instruction there (odd addressing, like the error we DELIBERATELY did earlier), reset the debugger and see what happens.

Using the assembler I produced the following 6 bytes of code:

203800014E71

that correspond to:

move.l 1, d0
nop

Using a hex editor I patched the ROM with those 6 bytes at offset $200. It does nothing. The game starts correctly. I have to assume that $200 is not executed. But do I? Let’s see what happens with other ROMs, Sonic 3 for example.

Same 3 instructions (nop, nop, bra $200) for Sonic 3 as well. The code must be there, even though it isn’t executed? Let’s try something else. I’ll try to make the “bra” condition true using a “tst” instruction that I’ve found here.

Nope, the game starts correctly. For now I think it’s safe to assume a start address of $200 and start our programs with the nop instructions, like this:

ORG $200
nop
nop
bra $200
...rest of the program...

… which I find irritating but I can live with it until I find more info. Right now let’s focus on creating a ROM that is readable by the emulator.

(going back to romtool.c …)

After a while, romtool.c has grown to 360 lines and now it can produce a complete ROM binary. It loads a “header.bin” file that contains the first 512 bytes of Sonic 1, splices the code from the SREC file and also patches the header data with the correct ROM capacity and checksum.

I also ran a test using the Sonic 1 ROM as “header.bin” and an empty SREC file so that the Sonic ROM remained intact. The checksum romtool calculated was identical to the one the original ROM had in its header, which is a good sign that our checksum calculation code works.

Using the assembler and the romtool, I built this simple program that just sets d0 to BBBB:

    ORG    $200
START:                  ; first instruction of program

* Mandatory (?) first instructions
    nop
    nop
    bra $200

* Put program code here
    move.l #$BBBB, d0

    SIMHALT             ; halt simulator

* Put variables and constants here

    END    START        ; last line of source

So now we have a Mega Drive ROM called “test00.bin”. Loading the ROM in the emulator and…

We’ve got our first results! After all we went through we have now ran our first program and set register d0 of an imaginary 68K inside an imaginary Mega Drive to $0000BBBB!

result.png

(anyone still here?)

To sum it up, our workflow for the near future will be:

  • Write a program in 68K assembly
  • Assemble the program to SREC using EASy68K
  • Convert the SREC to a BIN ROM using romtool
  • Run the ROM in Regen emulator

Afterword

Taking these notes was a great idea. It helped me develop a sense of direction, even though there’s nothing of the sort in this project. Maybe I’m going to do the same for all my projects from now on.

As for you, the reader, I suspect you had great fun as well. I know because you reached to the end of this wall of text, which makes it safe to assume that you too are somehow interested in developing for the Mega Drive (maybe owned it as a kid) and that your level of knowledge must be more or less the same as mine (otherwise you would either have been bored, or wouldn’t have understood a thing). So if you are here still with me, you automatically qualify for the next chapter!

Now that we can write programs and have them executed, what will our next milestone be? The possibilities are endless. Maybe we’ll write our first pixel on the screen. That would be awesome but I doubt it’s going to be as simple as writing a byte somewhere in VRAM.

Stay tuned to find out!

Previous Chapter TOC Next Chapter
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s