Beyond Good & Evil and the 13th MDisk

Table of contents

Introduction

Beyond Good and Evil video game is an interesting blast from the past. Released somewhere between 2003 and 2004, it turned out as a commercial failure for the big outside world, but took a special place in the heart of a little 9-years-old me.

The charm of multicultural city (a la Venice with Eastern motives), gorgeous soundtrack, juicy mix of arcade and open-world adventure interspersed with heartwarming characters and occasionally good plot. For me it was (and remains) the living, breathing world, as if planet Hillys actually existed on the other side of the screen.

It was love to BG&E that sparked the writing of this article, designed to answer the main question that floated in the back of my mind for the last 13 years – how the hell to beat this game 100%?

Dedication

To whom it may concern:

Brief overview

Partially about finding closure, reading assembler, reverse-engineering cipher algorithm and localization problems from the long gone past. I think this article is in the vein of various “Cracking console game’s password systems” series, which itself is a really nice hobby!

All that follows might contain potential spoilers.

Without a further ado, let’s get the ball rolling.


Problem statement

The story is really simple – partly being arcade game, Beyond Good & Evil has 3 types of collectibles:

Collectible Quantity
Pearls of Aramis 88
Photos of local fauna 56
MDisks 14

The thing is – there’s 13th MDisk, sealed behind the locker in a city bar (named Akuda). And there’s no (in-game) way you can unlock it.

Proposed solution

… well, turns out you can could, but it’s kinda sophisticated.

There existed online component for the game, called The Darkroom. Rules were simple – to unlock the locker with 13th MDisk one should do the following:

  1. go to the dedicated Darkroom website;
  2. enter a special Internet code (located at the bottom of in-game saving screen);
  3. play a little puzzle-game;
  4. and finally get 4-character code for the locker.

Why this didn’t worked for me

To strike out first item of the list above you should do the impossible – to use the resource that spits out 500 error for as long as I remember myself. Even though there’re evidences that it worked back in the day, 9-years-old me wasn’t aware neither of the Internet nor of the website.

“Wasn’t aware”? What’s that supposed to mean? RTFM!
– any profound engineer

And that’s a valid criticism! Reference to official BG&E website (with vague hint on what to do with cryptic Internet code) is located at page 16 of the game manual!1 Moreover, this reference is at the very same page in every language-specific version of the manual… except for the Russian one. Yup, political jokes aside, we were special even back then.

Russian localization of the game was done by the company called “Buka Entertainment” or “Бука” (eng. bugbear, bugaboo). For some reason, they decided that simple subtitle translation is not enough and went as far as to localize every part of the game (i.e. voice acting, some textures, UI and game manual).

One of the consequences of such move (more on this later) – you shouldn’t be a cyrillic expert to realize that on page 16 of Russian manual2 there’s nothing but Buka tech support department’s contact info.

So, it turns out that even if I, as a kid, had Internet connection in my home, knew how to use browser and guessed to skim through this PDF booklet, I would still remain unaware of the existence of the intricate scheme described above.

Motivation

What can I say – I was impressionable child. BG&E sunk into my soul so deeply that my raison d’être became its 100% competition… outstretched over 13 years. The reason is clear – I couldn’t open friggin’ locker and take the last MDisk.

I could give up and move along, get a life. Alas, I was not only impressionable, but stubborn.

Without knowing about The Darkroom, I checked every corner of every in-game location, eye-scanned all blurry textures for hints and parts of locker code, replayed game thousand times, talked to every NPC in different moments, and desperately tried to brute-force the locker (in the worst case scenario it was just 67600 combinations, yay). All in vain. I’m pretty sure that I’ve spent more than 100 hours playing BG&E and listening to Akuda House Propaganda before abandoning this mystery completely somewhere in 2005.

Fast-forward to nowadays – thanks to Google, I figured everything out. The sole fact that the cause of my headache was both absence of the website and absence of the information about this website pissed me so much that I proclaimed: “the time has come”.

And, as the story goes, dusting off an old BG&E CD, I embarked on the journey.


Solution

Initial idea was to decipher Internet code and create an alternative for the long gone Darkroom. But, if you think about that a little bit, the only thing we need is 13th MDisk, and there could be more than one way to get it:

Option Thoughts
Get the locker code from game file(s); The easiest one to implement, although sounds too boring.
Get the locker code from decoded Internet code; The middle way: both practical and interesting.
Just crack the locker wide open (or put 13th MDisk in your inventory) by applying modifications to game file(s). Intriguing, but smells like a cheating!

Everything looks doable, but, before getting our hands dirty, the reasonable thing to do would be to consult the Net and check if these ideas weren’t already implemented…

Existing implementations

It is a sin to reinvent the wheel unless it is just for the pleasure of exploration.
– Stephen Pelc, “Programming Forth”

… surprisingly, they were! For every option in the table above I’ve found exactly one fan-made tool:

  1. Locker code fetcher from save file;
  2. Internet code decoder;
  3. BG&E save editor.

At this point we could easily wrap up our lab and slap [CLOSED] on a case file, eh? Hold your pants on for a moment, as we gonna take the last spin before the deep dive.

Problem restatement 3

There are a few points which I don’t like in this implementations:

  1. Firstly, they don’t show how everything works under the hood, mainly the decoding algorithm and byte locations. Even though we can get what we want with them, it’s not conceptually appealing (for me!) to live with this misinformation.
    See, the main plot of BG&E was all about conspiracy and truth disclosure. It would be a nice tribute to do this project in the same vein, i.e. to strip underlying mystery to its bare bones.
    Considering cosmic balance of things, I badly needed to oppose lack of information from my childhood with excess of information by doing investigation and writing about it anyway!

  2. “Those who forget the past are doomed to repeat it”. Two of the existing implementations are both websites, just like good ol’ Darkroom was, and, thus, one day they can suddenly go down, just like their predecessor.
    It would be nice to create an offline open-sourced alternative, which you can simply back up in your cloud or, in the worst case scenario, rewrite by yourself from sources (and this article).
    When every user has its own copy, truck factor increases. Surviving is spreading.

Welp, after all this pretentious speeches the initial idea still holds: decipher Internet code, make a tiny decoder, leave its source code on the open ground, and spread a word about it. All aboard!

Tools

Lets take a look at the mighty toolbox we’ll take with us:

Tool Description
The game itself Steam version if you’d ask, although I used DxWnd to launch it in a nice small window.
Debugger OllyDbg, a no-brainer then it comes to low-level tinkering.
Hex editor I sticked with WinHex copy remained from university labs.
Lovely programming language Red, one of the few PLs in which I can fit my head without bumping the ceiling.
Pencil and tons of paper Oldschool!

There’re also a few other utilities which I used on occasion, I’ll mention them in the course of the narrative.

Analysis

First thing first, let’s gather some stuff to meditate on.

Internet code

Ah, the cryptic 16-character green thingies, what’s up with that? We surely need to check some of them to determine used alphabet, and, as they are located at the bottom of the saving screen, the only way to accomplish our task is to, well, harvest some saves!

Let’s find nearby checkpoint (these are called decoder players) and do the following:

As we’re engineers, lets roll out macro machinery to do this task automatically. Here’s my take with .mmmacro script:

1  | 0   | 0   | 2000 | Keypress Delay
2  | 683 | 384 | 1226 | Left Click Down
3  | 683 | 384 | 96   | Left Click Release
4  | 683 | 384 | 428  | Left Click Down
5  | 683 | 384 | 93   | Left Click Release
6  | 683 | 384 | 779  | Left Click Down
7  | 683 | 384 | 91   | Left Click Release
8  | 683 | 384 | 1237 | Keypress f12
9  | 683 | 384 | 1257 | Left Click Down
10 | 683 | 384 | 94   | Left Click Release
11 | 683 | 384 | 397  | Left Click Down
12 | 683 | 384 | 79   | Left Click Release
13 | 683 | 384 | 676  | Left Click Down
14 | 683 | 384 | 91   | Left Click Release

Fascinating show! So fascinating that I suddenly fell asleep and, after 4-hour-something nap, generated whopping pile of ~1.5k screenshots. Eek! Well, as uncle Scrooge said – the more the better. Although we need to crop out Internet codes and preprocess them a little.

Internet code’s font isn’t monospaced, so I took a slightly larger box with 418x626 and 940x681 coordinates for top-left and bottom-right corners respectively to fit every possible content.

Don’t know how about you, but I’m still a little bit sleepy, so I calculate width and height of a box in a REPL:

>> 940x681 - 418x626
== 522x55

And cast a small .xds script upon directory with screenshots:

crop( 418 626 522 55 )
binary( nodither )
negate( )
settings( 0 0 0 0 0 0 0 )
output_path( ... )
output( jpeg 100 0 0 0 0 0 1 1 1 0 1 )

Crispy! As a poor Win guy, I’ll grab first good enough tesseract frontend, namely gImageReader, and quickly skim through first random 20 entries. Yikes, lots of mistakes and manual corrections due to unusual font. Anyway, I saved them as codes.txt. Now, time to figure out the alphabet:

>> print alphabet: sort/case unique/case trim/all/lines read %codes.txt
!&+/123456789?ABCDEFGHJKLMNPQRSTUVWXYZ\abcdefghijkmnopqrstuvwxyz
>> length? alphabet
== 64

Oh, that’s interesting:

How many bits we need to store one character?

>> log-2 length? alphabet
== 6.0

So, we figured out how Interned code might look like, but what it actually is? It’s surely a hash containing some game stats, mainly total playtime (as hash changes with every save) and notorious locker code. What else?

Looking at the screens of The Darkroom remained in the Net:

Aha! Once player typed in Internet code and played puzzle game, he appeared in the so called most wanted list to match against the others. The criteria used are as follows:

Two remained columns are somewhat vague and I pondered over them for a while, but here’s the thing:

The last column is the total score calculated from the numbers above. So, it seems that Internet code contains at least all of the numbers above plus locker code.

Locker code

This one is a lot simpler. Every in-game keypad looks like this:

Nicely twisted spiral of [A-Z][0-9]. Sounds fancy to throw in a function to generate strings from bitsets:

unfold: func [range][
    to string! collect [
        repeat i length? range [
            if range/:i [keep to char! i]
        ]
    ]
]
>> keypad: unfold charset [#"A" - #"Z" #"0" - #"9"]
== "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"

Whoops! Not quite that, let’s fix it:

>> print append keypad take/part keypad 10
ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789

And what about alphabet size and bits per character?

>> round/ceiling probe log-2 probe length? keypad
36
5.169925001442312
== 6.0

Every in-game locker code has a 4-character length and follows letter-digit-letter-digit format. Hopefully Akuda closet is no exception for that!

Okay, we gathered enough data. All set. Now, to dissecting.

Reverse engineering 4

Where should we start? As with every software, the guts are in the root folder:

>> change-dir to-red-file "C:\Steam\steamapps\common\Beyond Good and Evil"
== %/C/Steam/steamapps/common/Beyond%20Good%20and%20Evil/
>> ls
    BGE.exe                 BGEMakingOf.bik         binkw32.dll
    CheckApplicatio...      eax.dll                 FicheroLeeme.txt
    installscript.vdf       jade.spe                Leesmij.txt
    Leggimi.txt             Lies-Mich.txt           LisezMoi.txt
    Manual/                 ReadMe.txt              Register/
    sally_clean.bf          SettingsApplica...      testapp.exe

This is a fresh install, I didn’t even launch the game. But, Internet code appears if save slot associated with it is occupied, i.e. if corresponding save file exists. This means that, in order to find out bytes which are hashed into Internet code, we need to figure out which files are created/modified after the first time we’ve used checkpoint:

>> before: read %./
== [%BGE.exe %BGEMakingOf.bik %binkw32.dll %CheckApplication.exe %ea...
>> ; started new game and saved
>> after: read %./
== [%BGE.exe %BGEMakingOf.bik %binkw32.dll %CheckApplication.exe %ea...
>> exclude after before
== [%sally.idx %slot0.sav]

Uh-huh, two files? Turns out:

Looking at the file extension of slot0.sav and the position of save slot I’ve chose (the first), we can tell that for each slot with index 1 through 5 there exists slot<index - 1>.sav save file. But… the hell is sally.idx?5 We don’t need to know it (yet), as our current concern is to determine where game stats bytes are. So, let’s move out save file and check the saving screen:

Uh-oh, Interned code and saving slot are still there! And if I try to load this slot, the game (with no surprise) restarts, as there’s no save file. Hence, bytes should be in sally.idx. For sanity check, let’s restore save file back and delete sally.idx in turn:

Gotcha, bytes related to hash encoding are definitely in sally.idx. But how deep the rabbit hole goes?

Game files

sally.idx

On a first sight, sally.idx looks like an index table separated into 64 byte chunks. For every chunk there’s an appropriate index at 0Ch and 0Eh offsets from the beginning of the chunk; there’re 10 chunks total, only the first one seems to be used (so as the first saving slot), others are filled with zeroes. Take a look:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  01 05 07 15 15 07 E1 07 20 00 01 00 00 00 00 00  ......б. .......
00000010  67 02 00 00 00 01 01 00 66 25 02 00 A0 86 01 00  g.......f%.. †..
00000020  A0 86 01 00 A0 86 01 00 00 00 00 00 04 0B 08 12   †.. †..........
00000030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

As with every thing, to see if it’s alive and moves – just poke it with a stick:

>> stick: trim/tail rejoin collect [loop 10 [keep ["poke!" space]]]
== {poke! poke! poke! poke! poke! poke! poke! poke! poke! poke!}
>> write/binary/seek %sally.idx to binary! stick 0
Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  70 6F 6B 65 21 20 70 6F 6B 65 21 20 70 6F 6B 65  poke! poke! poke
00000010  21 20 70 6F 6B 65 21 20 70 6F 6B 65 21 20 70 6F  ! poke! poke! po
00000020  6B 65 21 20 70 6F 6B 65 21 20 70 6F 6B 65 21 20  ke! poke! poke! 
00000030  70 6F 6B 65 21 20 70 6F 6B 65 21 00 00 00 00 00  poke! poke!.....

As with every living thing, don’t poke it too hard or it will die quite soon. I was gentle (and patient) enough and poked every nibble separately, checking how Internet code twitches as I go on. Here’re the most sensitive parts highlighted:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  01 05 07 15 15 07 E1 07 20 00 01 00 00 00 00 00  ......б. .......
00000010  67 02 00 00 00 01 01 00 66 25 02 00 A0 86 01 00  g.......f%.. †..
00000020  A0 86 01 00 A0 86 01 00 00 00 00 00 04 0B 08 12   †.. †..........
00000030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

01h bit marks save slot as existing, and 2000h seems to be a game-specific index of checkpoint location (Pey’j’s workshop in our case). Unhighlighted bytes also seems to contain some game-related info (I suspect Hovercraft races, Looters’ caverns and main quests). This leaves us with 20 significant bytes related to the hashing. What are they?

As with every living thing – the best way to get to know each other better is just to stick around for a while. And so I proceeded with playing the game for 1001th time, re-checking sally.idx with every most-wanted-list-related thingy gathered. Enlightenment wasn’t long in coming:

Starting offset Size Data
00000010h uint32 total playtime in seconds
00000014h uint8 pallet game trophies
00000015h uint8 animals photographed
00000016h uint8 pearls collected
00000017h uint8 seems to always be zero
00000018h uint32 YO! Pearl record
0000002Ch ? ?

Notice that we’ve found every part of the hash-related data, except for the locker code. Thus, it definitely should start at the skipped 0000002Ch offset. Assuming that those at 00000030h are just padding zeroes, we have 4 bytes remained, which maps perfectly on in-game locker code 4-character letter-digit-letter-digit format.

Starting offset Size Data
0000002Ch uint32 locker code
00000030h uint32 padding zeroes

But, uh…

04 0B 08 12

Where’s the letter… and where’s the digit? Let’s remind ourselves about the data we gathered:

>> keypad
== "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
>> round/ceiling log-2 length? keypad
== 6.0

And let’s assume that encoding starts at 00h, in this case 040B0812h means:

>> enbase: func [bytes][to string! collect [foreach byte bytes [keep keypad/(byte + 1)]]]
== func [bytes][to string! collect [foreach byte bytes [keep keypad/(byte + 1)]]]
>> print enbase to binary! 040B0812h
ELIS

As hivemind strong believer I saved this otherworldly reference to Elis, but otherwise this doesn’t make any sense in our context! Removing + 16 from enbase definition won’t do the trick either, as it will print DKHR. Our initial assumption about encoding table is wrong, but bear with me.

If what we see here is letter-digit-letter-digit format, and digits aren’t symbols from the alphabet, then 0Bh and 12h are actual digits, which doesn’t make sense either. If developers in Ubisoft were sane back then, they should’ve encoded 0-9 integers as 00h-09h… Oh, it clicked! Bytes are in little-endian order, and digits are encoded as uint8. What we have at our hands is S8L4 locker code. If encoding table starts with 00h as A, then:

>> enbase: func [bytes][to string! collect [foreach [l d] bytes [keep reduce [keypad/(l + 1) d]]]]
== func [bytes][to string! collect [foreach [l d] bytes [keep reduce [keypad/(l + 1) d]...
>> print locker-code: enbase reverse to binary! 040B0812h
S8L4

Showtime!

A global event if you’d ask me. Damned locker opened for the first time in the last 13 years. I’m eagier to grab 13th MDisk and check what’s on it, but we still have a lot to reveal:

All roads lead to Rome (or to Elis).

slot*.sav

Compared to sally.idx, this one is a monster:

>> size? %sally.idx
== 640
>> size? %slot0.sav
== 22355

And somewhere in there lies our precious locker code… What should we do? Well, uncle Curt would say “chill up kiddo, those are all numbers”, and he would be right! Our locker code is in fact just a bunch of alphabet indexes. We know what they mean, but we also want to know what they are under the hood. There’re two initial assumptions we can make:

We need to fill out some forms:

assumptions: [
    collect [
        foreach [l d] locker-code [
            keep reduce [(index? find keypad l) - 1 to integer! form d]
        ]
    ]
    collect [
        foreach char locker-code [
            keep (index? find keypad char) - 1
        ]
    ]
]

But wait! We’re assuming that encoding starts with 00h for A. What if it starts with 01h? No problem, let’s patch up our assumptions with variants where we subtracting 0 instead of 1:

>> append assumptions replace/deep/all copy/deep assumptions [quote 1] 0
== [
    collect [
        foreach [l d] locker-code [
            keep reduce [(index? find...
>> indexes: reduce assumptions
== [[18 8 11 4] [18 34 11 30] [19 8 12 4] [19 35 12 31]]

Good. Now, for every block of indexes, we need to look over all possible variants of encoding, and here things get tricky:

You see the pattern. Moreover, bytes could be reversed, i.e. instead of:

>> first indexes
== [18 8 11 4]

It might be:

>> reverse copy first indexes
== [4 11 8 18]

In the worst case scenario bytes might not be consecutive at all! We already have 2 × 4 × 2 possible variants, and now this…

But, fear not. Have you noticed that the pattern phrase above was so suspiciously highlighted? That’s the key – in any case, our code is just 4 bytes surrounded with any number of zeroes.

To the kitchen! We should cook up a little parsing grammar:

slot: [some bits]
bits: [code | skip]
code: [4 uint]
uint: [zero byte zero]
zero: [0 3 #{00}]
byte-rules: collect [
    foreach proposal indexes [
        take/last _: collect [
            foreach i proposal [
                keep to binary! reduce [i]
                keep '|
            ]
        ]
        keep compose/deep/only [[1 (_)]]
    ]
]
>> probe new-line/all byte-rules on
[
    [1 [#{12} | #{08} | #{0B} | #{04}]] 
    [1 [#{12} | #{22} | #{0B} | #{1E}]] 
    [1 [#{13} | #{08} | #{0C} | #{04}]] 
    [1 [#{13} | #{23} | #{0C} | #{1F}]]
]
src: read/binary %./slot0.sav
callback: does [
    print form compose [
        found (result)
        at offset (to-hex subtract index? find src result 1) 
    ]
]
bits: [copy result code (callback) | skip]
reveal: does [foreach byte byte-rules [parse src slot]]

Phew! I’m a little bit nervous, and you?

>> reveal
found #{00000013000000230000000C0000001F000000} at offset 00002D55
== true

Awesome? Aww yiss. Have we gathered some interesting insights from that? Yessiree!

>> debase: func [code][to binary! collect [foreach key code [keep reverse to binary! index? find keypad key]]]
== func [code][to binary! collect [foreach key code [keep reverse to binary! index? find...
>> code: debase "Jade"
== #{0A000000010000000400000005000000}
>> write/binary/seek %./slot0.sav code 00002D58h

I suspect that every in-game locker code can be changed that way (if you know its offset). Anyway, we’ve opened Akuda closet 2 times in a row, should we stop and go home?

No way, we just warmed up.

Decoder

Ah, the essence of the story. Here’s the summary of everything we’ve learned about encoding algorithm up to this point:

Not that much, huh? But, knowing what goes in and what comes out, and having an ultimate X-ray device in our hands (a debugger), we can throw in a special marker into this black-box machinery, shine through its body and trace marker’s path all the way down.

Since bytes in sally.idx are the only thing that goes in, I’ll replace YO! Pearl record in it with:

Offset(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000  01 05 07 15 15 07 E1 07 20 00 01 00 00 00 00 00  ......б. .......
00000010  67 02 00 00 00 01 01 00 DE AD BE EF A0 86 01 00  g.......f%.. †..
00000020  A0 86 01 00 A0 86 01 00 00 00 00 00 04 0B 08 12   †.. †..........
00000030  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

After that, I’ll launch the game as usual, scroll over first slot for Internet code to show up, and rummage through memory dump in search of our marker:

Hey there beefy! Now it’s the matter of setting a breakpoint on read access and waiting for the game to choke on it:

Scrolling one page up and here it is, lying in it’s naked beauty – the standard sequence of

PUSH EBP
MOV  EBP, ESP
SUB  ESP, ...

that indicates the creation of a new stack frame and the start of a function. What follows is a rather old-fashioned art of printing program listing, monitoring hand-drawn stack spreadsheet, rewriting assembly as 3AC and sipping tea.

Having done all of the above, let me spread cards on the table:

Surprisingly, function itself is called 21 times, but only last 16 calls are related to encoding (perhaps first 5 are used to get save location and playtime).

But out of all 16 calls, Internet code is already encoded on a first pass! See, function recieves 2 arguments: saving slot index (0 - 4) and letter index (0 - 15). At the end of the call, it takes letter index, multiplies it by 4 and adds it to the starting position of the Internet code on the stack. That gives us only 1 letter out of 16!

And good news are – encoding is totaly reversible. To prove that (and to explain decoding algorithm in detail as promised), let me guide you through step-by-step example below.


Example

To decode Internet code, we should take the block scheme above and work backwards:

5 dwords would be fully restored game data from sally.idx, in which only Akuda locker code is of interest for us (4th dword).

Lo and behold! Whole world struck in awe as we approach this cryptic beast with our heroic weapons – perseverance, courage and… a pile of tables?!

Filling piggy banks

Our example input will be this Internet code: fCe\w9!iBXJ1ijn&.
You can quickly check in advance that locker code we’re looking for is J4D3.

So, the very first thing we gonna do to get our Jade back is to convert Internet code string into array of 16 bytes using the following encoding table:

(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00 A B C D E F G H + J K L M N \ P
10 Q R S T U V W X Y Z a b c d e f
20 g h i j k & m n o p q r s t u v
30 w x y z / 1 2 3 4 5 6 7 8 9 ? !

As was said before, it’s a mix of Base64 and Base58 encodings. You can generate Internet code alphabet by taking Base64 and substituting:

Character Substitution
I +
O \
l &
0 /
+ ?
/ !

As we can see, 1F 02 1E 0E 30 3D 3F 22 01 17 09 35 22 23 27 25 is our answer. In this bytes, only 6 rightmost bits are significant (since Internet code alphabet consists of 64 = 26 symbols) and 2 leftmost bits are always zero.

This completes the first step of encoding. Pretty straightforward, huh? Moving to the next one.

Bit twiddling

And here things get trickier. Second step of decoding boils down to the chain of 3 loops, in which first two swap some bits between some bytes, and the last one flips (i.e. takes boolean complement) some bits in some bytes.

Here, some is determined by simple arithmetic means, but also from so called round keys, or subkeys, obtained from the key. But what is the key?

Key

Key is just an array of eight prime numbers, and it’s also the very first thing you’ll stumble upon in original encoding function disassembly. Here’s how to obtain it:

Ta-da! Ain’t you an elite hacker now?

Flip and swap

Now, before telling you about these 3 loops, let me establish a few conventions and show an example diagrams of flipping and swapping. Hopefully, this will eliminate any misunderstandings between us.

  1. So, the first convention is usage of indexes – I’m gonna stick with 0-based numbering. Not because it’s easier to work with (in fact it’s the opposite), but because the original algorithm used it, so as the most modern programming languages.

  2. The second convention is usage of offsets – when I tell you “bit located at offset 5” or “5th bit”, this means that, to get this bit, you need to skip 5 bits from the right. We can say that bit indexing is 0-based too, with the caveat that actual indexing starts from the rightmost position.

Suppose now that decoding algorithm tells us “swap 4th bit in 12th byte with 2nd bit in 6th byte”. Our 12th and 6th bytes are 22h and 3Fh, respectively. We could picture this as follows:

Obviously, after swapping our renewed 12th and 6th bytes will look like that:

Bit value at one position replaced bit value at another position and vice versa.
Note, however, that swapping could occur in the same byte, as in “swap 1st bit in 13th byte with 2nd bit in 13th byte”. Moreover, same bit in same byte could be swapped with itself, but, in that case, swapping won’t cause any changes.

And what’s up with flipping? Well, “flip 0th bit in 0th byte”, since our 0th byte is 1Fh, would be drawn as:

Result of such operation is:

1 changes to 0, and 0 changes to 1, as if boolean NOT is applied.

And one last detail – all 3 loops described below iterate from some integer down to 1. That’s it. Now, with all that information in mind, I, finally, invite you to the circus of bit jugglers. Allez hop!

Round 1

First loop we gonna examine is inexpressibly boring.
Suppose we have some variable i that changes from 6 down to 1. Then, at every step of such iteration, swapping occurs, and byte indexes with bit offsets are computed from the value of i:

Variable Value
Iteration count (i) 6..1
1st byte index 2 * i + i / 6
1st bit offset i % 6
2nd byte index 15
2nd bit offset i - 1

Here, % is modulo operator and / is integer division. The table above could be rewritten as some sort of for looping construct:

for i: 6 to 1 [
    swap (i % 6)-th bit in (2 * i + i / 6)-th byte with (i - 1)-th bit in (15)-th byte
]

You can check this by yourself if you really want, but our example input:
1F 02 1E 0E 30 3D 3F 22 01 17 09 35 22 23 27 25
will remain unmodified after the first round. That’s because on each step of iteration two bits being swapped are identical.

For example, on 3rd iteration:

So, “swap 3rd bit in 6th byte with 2nd bit in 15th byte”.

Our 6th and 15h bytes are 3Fh and 25h, respectively. Let’s examine their binary representations:

>> to-bits: func [x][enbase/base to binary! x 2]
== func [x][enbase/base to binary! x 2]
>> to-bits 3Fh
== "00000000000000000000000000111111"
>> to-bits 25h
== "00000000000000000000000000100101"

3rd bit in 3Fh is 1 (you can easily convince yourself that it’s true, since all 6 significant bits are ones), and 2nd bit in 25h is 1 too (remember, you need to skip 2 bits from the right). Swapping one 1 with another 1 is the same as doing no swapping at all!

See, I told you that it’s a boring loop.

Round 2

Second loop is the big brother in our three members family. We’ll pass our 16 bytes through it shortly, but first we need to calculate two round keys.

Recall that our key looks like that:
25 1F 1D 17 13 11 0B 07

And that our 16 bytes passed through first round of bit swapping are:
1F 02 1E 0E 30 3D 3F 22 01 17 09 35 22 23 27 25

Now, take the 15th byte, 25h, and then take its 3 leftmost bits:

>> to-bits 25h
== "00000000000000000000000000100101"

As you can see, binary representation of hexadecimal 25h is 100101b, and its 3 leftmost bits are 100b, or, in decimal form, 4. This number will be an index of prime number in the key.

In our case, 4th number is 13h. Congratulations, you’ve just obtained the first round key!

Our key is an array of 8 prime numbers. If you’ll split it in half, then 13h will remain on the second half and its “mirror image” on the first half would be 17h. Yeah, as you may guess, that’s the second round key.

In general:

Well, that was mouthful. I can see that you’re eager to examine another fancy table, so, here it goes:

Variable Value
Iteration count (i) 30..1
m (i * 1st-subkey + 45) % 90
n (i * 2nd-subkey + 45) % 90
p m % 6
q n % 6
1st byte index (m - p) / 6
1st bit offset p
2nd byte index (n - q) / 6
2nd bit offset q

You probably have noticed that it’s slighlty more complicated than the one we saw in the first round. No biggie, this complication comes from the usage of subkeys and 4 coefficients with very original names: m, n, p and q.

To check that everything worked out as expected, examine bytes before and after second round:

Only two last bytes remained untouched, quite a big change if you’d ask me.

Round 3

In the final round, we need only one subkey, and the process is quite the same:

So, you can see that 25h is still 25h after the second round, and its binary representation is 100101b, hence index is 101b or 5, and 5th element in our key is 11h – that’s the round key for the third round.

While first two rounds were all about bit swapping, this will be about bit flipping, and its table is really tiny:

Variable Value
Iteration count (i) 40..1
m i * 3rd-subkey % 90
n m % 6
byte index (m - n) / 6
bit offset n

Finally, result of the whole second decoding step is:
03 28 21 05 05 21 19 00 04 23 10 35 07 21 0D 25

This concludes our tour at the circus of bit juggling. Phew! But wait, you’d better fasten your seatbelts and be ready for the last ride – the third decoding step.

Beyond codes & lockers

What we have at our hands now is more of a Lego pieces than completely restored game data. In order to assemble it back and retrieve Akuda locker code, we need to take some pieces and mash them together. In our case, pieces are bits, and some is determined by, guess what, a table!

Dword index Byte index Mask Offset
1 00 007E0000h 17
1 01 00010000h 11
1 01 00000003h 03
1 02 00003F00h 08
2 01 07000000h 24
2 03 00FC0000h 18
2 04 0003F000h 12
2 05 00000FC0h 06
2 06 0000003Fh 00
3 08 1E000000h 25
3 09 01000000h 19
3 09 00001F00h 08
3 10 001F0000h 14
3 10 0000000Ch 02
3 11 00000003h 04
4 07 000000FCh 02
4 08 00000003h 04

It’s a monstrous Godzilla, I know, but bear with me. You’re already familiar with offsets and indexes, and notion of mask is rather simple – it’s a chain of bits in which (if mask is applied to data with bitwise AND) ones tell us what data to keep, and zeroes tell us what data to throw away.

How this thing works anyway? Let me show you this by working example. I’ll also pinpoint two major caveats in the algorithm, so, keep your eyes open.

Do you remember that locker code is in the 4th dword? But, since we’re using 0-based numbering right now, it’s actually in the 3rd dword. Since 3rd dword is all what interests us, we will focus on the portion of the table with 3 in “Dword index” column.

Let me set up everything first:

>> &: :and |: :or
== make op! [[
    "Returns the first value ORed with the second" 
    value1 [logic! integer! char! bitset! binary! typeset! pair! tuple! vector!] 
    valu...
>> bytes: #{03 28 21 05 05 21 19 00 04 23 10 35 07 21 0D 25}
== #{03282105052119000423103507210D25}
>> code: 0
== 0

Ok, now, for every row with 3 in the first column we will do the following:

>> code: code | (bytes/(08 + 1) & (1E000000h >> 25) << 25) 
== 134217728
>> code: code | (bytes/(09 + 1) & (01000000h >> 19) << 19) 
== 150994944
>> code: code | (bytes/(09 + 1) & (00001F00h >> 08) << 08)
== 150995712

You can quickly get the main pattern – we’re ORing together some bits from some data for rows that have the same dword index:

Repeating this process for every row in the group, we will restore dword (and locker code) back:

>> code: code | (bytes/(10 + 1) & (001F0000h >> 14) << 14)
== 151257856
>> code: code | (bytes/(10 + 1) & (0000000Ch >> 02) << 02)
== 151257856

And here comes the first caveat:

>> code: code | (bytes/(11 + 1) & (00000003h << 04) >> 04)
== 151257859

Ta-da! This is our restored locker code. Ain’t convinced? Here’s the trick:

>> to binary! 151257859
== #{09040303}

Recall that locker code is stored in letter-digit-letter-digit format – letter is a letter index from 0-based A-Z alphabet, and digit is actual digit.

This means that, in our 09 04 03 03 byte representation, 09h is the 9th letter, or J, 04h is 4, 03h is the 3rd letter, or D, and 03 is 3. Wrapping it up all together, we got our J4D3 back!

In order to restore one game data chunk from sally.idx, simply repeat the process described above separately for each of four groups of values that have the same dword index in the first column. At the end, you’ll have 4 dwords: first is game stats, second is Yo-Pearl! record, third is Akuda locker code, and forth is just padding zeroes.

But watchful reader may have noticed that something is missing in the table above. Where’s the zeroth dword?. Behold, the second caveat:

>> (bytes/(14 + 1) * 60 + bytes/(13 + 1)) * 60 + bytes/(15 + 1)
== 48817
>> to time! 48817
== 13:33:37

Astonishing, you probably can see the matrix now.

I assume that you, dear reader, is interested only in locker code retrival, and so won’t bother you with data and checksum validation, which, while being an important part of decoding scheme, isn’t related to the problem at hand at all.

Well, that’s the conclusion of our cryptographic trip. We’ve cracked this virtual Pandora’s box in a blink of an eye. Good job!

But why my spider sense is tingling..?

Buka intermezzo

You probably remember how I explained why I didn’t know about The Darkroom back then. I also told that Buka have done a good job at localization – they translated literally everything… not only translated, but transliterated:

Let me now restate: even if I, as kid, had English version of the game manual in my hands and figured out how to read it, got familar with web surfing and found The Darkroom, I would still be unable to open this friggin’ locker, because Internet codes were in a transliterated format! FML.

Ugh, back to reality. Okay, this didn’t look like much of a problem anyway, since I already had Russian CD version with me and could quickly add an additional transliteration8 layer on top of decoding algorithm which will normalize Internet/locker codes from Buka version of BG&E. “Easy peasy!” I said… but then I figured out encoding table for Internet code.

Ready? Here’s your portion of daily WTF:

(h) 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00 А Б Ц Д Е Ф Г Ю + Й К Л М Н / П
10 Я Р С Т У В Ш Х Ч Ж а б ц д е ф
20 г ю и й к & м н о п я р с т у в
30 ш х ч ж / 1 2 3 4 5 6 7 8 9 ? !

Again, no cyrillic reading skills required (in fact you shouldn’t look at letters at all). They completely messed up with slashes, 0Eh and 34h are identical! This means that there’s no bijection between two alphabets, and for n / in Internet code there’re 2n variants of decoding. Bloody hell!

Sigh, life gave us a lemon. On the one hand, I want to let users with Buka version to use decoder as easily as everyone else, but on the other hand, I don’t want to create crude and ugly workarounds because of someone’s mistake. What should we do?

Welp, let’s make a lemonade!

Picklock

Here’s the last ace from the sleeve – when we examined slot*.sav in search of locker code, I concluded that it’s 4 uint values. I lied. It’s 5. Can you guess what 5th is? Tip: it’s always zero.

Drumroll! It’s:

So, we could say that alphabet encoding actually starts with 00h for Validate key and locker code in slot*.sav file is stored as 5 consecutive alphabet indexes. To rephrase, it’s a sequence of keys user should press to unlock the locker. And what if the very first key is Validate itself? Let’s zero out leftmost byte in the first index:

>> write/binary/seek %slot0.sav #{00} 00002D58h

Perhaps Erase is mapped to some index too, but I haven’t found it (though I only tried -1 and 37). In any case, this is a simple and elegant alternative solution to the problem with Buka transliteration – we don’t even need to type anything language-specific!

Conclusion

As I watch how code flows through my fingers and constellations of words and blocks fill the screen, 13 years old mystery crumbles into dust. And from that dust arises daruma, the ultimate kitchen sink for all MDisk #13-related problems of yours that brings happiness in the form of 100% game completion. Hooray!

That was an interesting project, both as the very first serious programming experience and engineering activity. It also relieved me from unbearable burden from the past, and provided a chance to learn lots of new things. Thank you, humble reader, for getting this far with me.

And thanks to Ubisoft for a challenge!


Oh, but wait! The 13th MDisk! I resisted the temptation to Google up its content all this time, but now, I guess, is the right moment. Let me just open the locker in a millionth time.

..? .?! What the f… That’s all?! For all these years… Well, as one swearing monkey have said…

Footnotes

  1. English manual 

  2. Russian manual 

  3. Humble reader may have noticed that this paragraph smell slightly like an excuse for doing fun, so I confess – to hell with rationale, it was just The Right Thing™ to do as a summer project and a chance to learn something new. 

  4. Violating the license, yarr! 

  5. Amusing factoid: in early alpha versions Jade (the main protagonist) was a much younger woman named Sally. 

  6. Indexing in Red starts from 1, not 0

  7. Guess what, it’s 13 again!
    And the sum of 3 removed numbers (2, 3 and 5) is 10, which is equal to 13 - 3.
    Also, in the 1st step we took first 12 prime numbers, and 13 is 12 + 1… 

  8. Some Web 1.0 version of transliterator