In this chapter we’re going to handle timing and have our code run periodically. This will enable us to do such cool things as a scrolling background.
We’re also going to ingest the super mushroom! We’ll start looking at what other people have done. Figuring out stuff by myself has been fun but it gets old real soon.
So, let’s dig in!
Creating a main loop
There are different kinds of things we can do during the active scan and the vertical retrace periods. During the active scan, VDP is really busy, so we’d better not mess with it too much. While the VDP is putting pixels on screen, the CPU has “limited access” to it. This means that the writes we do to data and control ports of the VDP will be consumed less frequently by the VDP, so the corresponding FIFO will become full quickly. I think I saw somewhere in the manual that the FIFO has enough space for 4 writes.
It is both a good and bad thing that, after the FIFO becomes full, the CPU must wait before it can write again. Thank God for this, because otherwise we’d have to check if the FIFO is full before writing. This would complicate things a lot. The bad thing is that we get a performance penalty. Fortunately this is small (about 5-6 microseconds), but nevertheless we don’t want to perform frequent writes to the VDP during active scan. These small delays could stack up pretty easily.
During the vertical retrace, we have unlimited access to the VDP.
Given the above, the simplest way to organize our main loop would be:
Update (perform game logic etc) Wait for VBlank Render (push cells/indices/palettes to VDP) Wait for active scan Repeat
Probably, this is the only way we can do it. There’s this VBlank interrupt that bugs me. I’m trying to figure out what its purpose is, and if we could somehow make use of it. For example, we could set a flag in the interrupt, so that our update routine terminates, but this requires that our update routine is periodically checking for that flag. And if we do that, we don’t even need to use the interrupt. The flag is already available to us in the VDP “status” register:
The other thing that’s bugging me is that we have absolutely no way of controlling when a frame is displayed. Actually, the frame is controlling us. So, in case our update routine takes much time, we will end up waiting for the VBlank of a next frame. I guess the only way around this is to make sure that our update routine is short enough.
Since there’s no practical way of enforcing this, the next best thing is to somehow display a warning if update exceeds its time limit. Let’s push that to our TO-DO stack and carry on with our main loop. We need two routines to implement it:
Both routines will loop until a condition is met. They’re going to be similar to what we did for waiting for DMA completion:
VDPWaitForDMA: VDPReadStatus d0 btst #3, d0 bne VDPWaitForDMA rts
This time we’re looking at bit #2. Let’s start with waiting for the active scan, which will be nearly identical to waiting for DMA completion. We’re looking for a bit to be cleared:
VDPWaitForActiveScan: VDPReadStatus d0 btst #2, d0 bne VDPWaitForActiveScan rts
And then, we flip the condition to wait for VBlank (the things I’m capable of, to avoid understanding again what “bne” means):
VDPWaitForVBlank: VDPReadStatus d0 btst #2, d0 beq VDPWaitForVBlank rts
Ready to roll
We are now ready to make our main loop. But I’d really like to make it do something. Would scrolling our background be too much to ask? Let’s see what the manual has to say…
I had to ask, hadn’t I?
So, to scroll horizontally, essentially you fill this table with the same offset value. Even slots of the table will affect scroll A and odd will affect scroll B. We can also specify a different offset per scanline, which would enable us to do some nice effects. I never liked a game that didn’t have a fake perspective floor.
But I’m not going to do it right now. Alternatively, you can only set the first table position for the entire scrool A and the second for B. I’m not in the mood right now to push 480 words to VRAM, during the VBlank, which, as we saw earlier, is severely limited. In order to do that, you set register #11 to be zero, which we already did during initialization.
Like the sprite table, this table will be more or less permanent in VRAM. So, let’s put it right above the sprite table.
We’re putting the HScroll table to F800, which is right above the sprite table, as tightly packed as the resolution permits (1KB increments – 6 most significant bits of the 16 bit address). We set those bits to register #13.
By the way, here is the main program logic:
jsr appStart ; app initialization routine jsr VDPWaitForActiveScan ; wait for the next active scan mainLoop: jsr appUpdate ; app update - during active scan jsr VDPWaitForVBlank jsr appRender ; app render - during VBlanking jsr VDPWaitForActiveScan jmp mainLoop
I put the start/update/render routines in a separate file called “app.x68”. Now our previous test’s code is in appStart. We now have a cleaner main file. Actually let me rename it from “test.x68” to “entryPoint.x68”, to be more descriptive.
Now, let’s actually do something in our update routine. I’m just going to increase d1 by one:
appUpdate: add.l #1, d1 rts
It’s going way too fast. I’ve put code in the wait functions as well, to increase other registers every time they loop. I then divided these registers with d1 to find out that we can do 6 iterations of adding and branching back per frame. I know the Mega Drive is a bit limited, but this is just ridiculous! Furthermore, my frame counter goes up really fast. In the order of thousands of frames per second.
This can only mean one thing: we are not actually waiting for VSync.
Maybe we’re not testing the right bit? Nope, I double-checked the docs.
– Enough is enough. –
This is a major turning point in this series. From now on, I’m going to utilise every single bit of information I can get my hands on. Because otherwise we’re getting nowhere. I’ll find another way to add some adventure to it.
There are far too many errors in the sega2 document. Look at this for example:
WaitVBlankStart: move.w vdp_control, d0 ; Move VDP status word to d0 andi.w #0x0008, d0 ; AND with bit 4 (vblank), result in status register bne WaitVBlankStart ; Branch if not equal (to zero) rts WaitVBlankEnd: move.w vdp_control, d0 ; Move VDP status word to d0 andi.w #0x0008, d0 ; AND with bit 4 (vblank), result in status register beq WaitVBlankEnd ; Branch if equal (to zero) rts
This is written by Matt, from BIG EVIL CORPORATION.
Again, I’m not going to try and trace where this info came from, but I’d really like to recursively thank everyone who helped solve this ridiculous mystery.
Also, the BIG EVIL CORPORATION blog has a very nice wordpress theme. I’m stealing that as well.
So, after all, we need to check for byte #3. Fine by me, but then, what on earth were we doing when we were waiting for DMA completion? We were actually waiting for the next frame…
So, where is the DMA bit then? Aha:
I’m keeping this txt by Charles MacDonald.
So we correct all 3 of our waiting routines. Now, frame counts are more believable.
Again, I let it run for a while and did the divisions in windows calculator. Seems like we can do about 2658 repetitions of “add, read status, check bit, branch” in our update routine, and about 242 in our render routine. Which totally sucks.
So, VBlanking will take about 8% of our time. That’s an easier number to keep in mind.
Back to scrolling
The only thing left to do now, is write the value of d1 to VRAM location F800. A macro would be handy for writing VRAM:
macro VDPPointToVRAM, addr VDPWriteToControl #(( \addr & $3FFF) | $4000) VDPWriteToControl #(( \addr >> 14) & 3) endmacro
And then, here is how we scroll:
VDPPointToVRAM $F800 VDPWriteToData d1
And yes! It slides!
I’m only scrolling plane A. It seems to be enough, as plane A is of higher priority than plane B, there are no transparent pixels in our image, so we can’t see plane B anyway.
Now, I’d like it to scroll a bit faster and from right to left:
add.l #-4, d1
Initializing the rest of the VDP (CRAM, VSRAM)
I’m adding a similar macro to point to CRAM, so we can easily change colors:
macro VDPPointToCRAM, addr VDPWriteToControl #(( \addr & $3FFF) | $C000) VDPWriteToControl #(( \addr >> 14) & 3) endmacro
And one more for VSRAM (Vertical Scrolling RAM):
macro VDPPointToVSRAM, addr VDPWriteToControl #(( \addr & $3FFF) | $4000) VDPWriteToControl #((( \addr >> 14) & 3) | $1000) endmacro
And here is how we clear both the CRAM and VSRAM to zero in our initialization routine:
VDPPointToCRAM 0 repeat 64 VDPWriteToData #0 endrepeat VDPPointToVSRAM 0 repeat 40 VDPWriteToData #0 endrepeat
A little more housekeeping
Sprite table and Hscroll table will be constants, so let’s make some assembler symbols for them:
defc VDPSpriteTable = $FE00 defc VDPHScrollTable = $F800
Next item in my list is: “Optimize double writes to control with a long write”. I don’t know if this is going to save some cycles, but let’s do it anyway.
I’m creating two more macros VDPWrite*L, with the ‘L’ suffix:
macro VDPWriteToControlL, value move.l \value, $C00004 endmacro macro VDPWriteToDataL, value move.l \value, $C00000 endmacro
And replacing my writes with packed long word writes throughout the code, like this:
VDPWriteToControlL #$40000080 ; Destination address (0)
Having taken care of a lot of items in our list, it’s time to move to somewhat higher level programs. Stay tuned for next chapter where we’ll create a primitive memory manager that will let us refer to memory locations using variable names instead of numbers.
|Previous Chapter||TOC||Next Chapter|