Crackme 19: XOR Madness

This crackme from Root-Me is titled “XOR Madness” (also hosted here). The binary is pretty small (2.5 KB) and has a striking property: almost every single instruction is either `xor` or `call`. This obfuscation makes it difficult to reverse-engineer the code. This is a PE (Windows) binary, so we won’t be able to use GDB or other ELF analysis tools on it. It does seem to run under Wine, although since it’s a 32-bit binary you’ll need to use have Wine32 installed in order to run it. If you use the `Dockerfile` provided on this site’s about page, you can install Wine32 with:

``````\$ sudo dpkg --add-architecture i386 && apt-get update && apt-get install wine32
``````

Let’s give it a try:

``````\$ wine xormadness.exe
Nope...
``````

The use of XOR in this program ¶

As a pre-requisite, let’s quickly review a few properties of th XOR operation. XOR (written `^` in Python) is an exclusive OR, meaning it operates at the bit level with two operands and produces a value with bits set to 1 only where the corresponding bits of the two operands were different from each other. The instruction `xor op1, op2` computes `op1 ^ op2` and stores the result back into `op1`.

XOR instructions are used in this program for three main purposes:

1. `xor eax, eax` sets `eax` to zero
2. `xor eax, eax` followed by `xor eax, ebx` copies the value of `ebx` into `eax`
3. `xor eax, ebx` followed by `xor ebx, eax` and `xor eax, ebx` swaps the values of `eax` and `ebx`. This is sometimes written in obfuscated C as `a ^= b ^= a ^= b`.

Make sure this is clear before diving into this crackme, since more than 80% of the instructions in this program are XOR.

The initial printf ¶

Let’s start by opening it in Ghidra. Starting at the entry point, we get a glimpse of the promised “madness” to come. In addition to XORs, we can see some `call` instructions that literally point to the next instruction (for a reminder of the difference between `call` and `jmp`, see this StackOverflow answer).

``````00401000    xor    ebp, ebp
00401002    xor    ebp, esp
00401004    sub    esp, 0x100
0040100a    xor    edi, edi
0040100c    xor    edi, esp
0040100e    call   EntryPoint+19    ; ―――╮  A very short call to the next instruction.
00401013    xor    eax, eax         ; ←――╯
00401015    xor    eax, dword [esp]
00401018    xor    dword [esp], eax
0040101b    xor    dword [esp], 0x203a
00401022    call   EntryPoint+39    ; ―――╮  Another call (not a jmp!)
00401027    xor    eax, eax         ; ←――╯
00401029    xor    eax, dword [esp]
0040102c    xor    dword [esp], eax
0040102f    xor    dword [esp], 0x64726f77
00401036    call   EntryPoint+59     ; ―――╮ One more...
0040103b    xor    eax, eax          ; ←――╯
0040103d    xor    eax, dword [esp]
00401040    xor    dword [esp], eax
00401043    xor    dword [esp], 0x73736150
0040104a    xor    eax, eax
0040104c    xor    eax, esp
0040104e    call   EntryPoint+83      ; ―――╮ And a fourth one.
00401053    xor    ebx, ebx           ; ←――╯
00401055    xor    ebx, dword [esp]
00401058    xor    dword [esp], ebx
0040105b    xor    dword [esp], eax
0040105e    call   dword [imp_printf] ; calls printf
``````

This is not particularly readable, but the constants used do stand out. As we have seen in multiple posts already, encoding strings as multi-byte integers is a common obfuscation technique. Here, the range of values gives it away: `0x65` is lowercase `a`, `0x41` is uppercase `A`, and `0x20` is a space. So values like `0x64726f77` or `0x73736150` are clearly made of letters, and the last `call` instruction to `printf` tells us that this is preparing the write.

Let’s decode them and see where they are copied:

First constant (2 bytes)Second constant (4 bytes)Third constant (4 bytes)
Bytes`20 3a``64 72 6f 77``73 73 61 50`
As characters` :``d r o w``s s a P`

The little-endian values reverse each chunk, but the string is still clearly visible: `Password: `.

It is remarkable that these values are not explicitly copied to different memory addresses separated by a few bytes, but that they are all added to a buffer by simply executing `xor dword [esp], \$constant`; this way this is implemented is pretty unusual.

Since `esp` points to the top of the stack and that the stack is growing down, this means that `esp` points to the lowest address on the stack. Although it can appear as if we were overwriting data with successive writes to `[esp]`, the `call` instructions are what makes it possible here: each `call` will first push the return address on the stack, and then jump to the address specified by the argument. This decrements `esp` by the size of the return address, in our case 4 bytes.

Before these chunks are copied, we start with:

``````        address    | contents
–––––––––––+––––––––––––
esp ->  0x0019fe70 | 00 00 00 00
``````

After `xor dword [esp], 0x203a`, we get:

``````        address    | contents
–––––––––––+––––––––––––
esp ->  0x0019fe70 | 3a 20 00 00
``````

Then we call `0x00401027`. This pushes the return address on the stack (which is pushing the address of the next instruction, not the address we’re calling even though they’re the same here), and also decrements `esp`. We end up with:

``````        address    | contents
–––––––––––+––––––––––––
esp ->  0x0019fe6c | 27 10 40 00
⇡⇡⇡     0x0019fe74 | 3a 20 00 00
``````

The next `xor` instructions will clear the contents of `[esp]` and replace these 4 bytes with the next string chunk.

In order, this is what happens:

AddressStepContents of memory after each stepNote
`0x0040101b``xor dword[esp], 0x203a``3a 20`
`0x00401022``call` (push address + jump)`27 10 40 00 3a 20`return address before " :"
`0x0040102f``xor dword[esp], 0x64726f77``77 6f 72 64 3a 20`return address overwritten
`0x00401036``call` (push address + jump)`3b 10 40 00 77 6f 72 64 3a 20`return address before “word :”
`0x0040104e``xor dword[esp], 0x73736150``50 61 73 73 77 6f 72 64 3a 20`return address overwritten

The last call is followed by a series of `xor` instructions that move the value of `esp` to `eax`, using `ebx` as a temporary register. The call to `printf` expects the string to be pointed by `eax`, which will now point to the first byte of `Password: `. Our prompt is displayed.

Reading input with `scanf`¶

After the initial prompt, more `xor`/`call` instructions follow with a noticeable reference to `scanf`. Once again the constant `0x73333625` is clearly in the printable range, and indeed it encodes the string `"%63s"` which is used as a format string for `scanf`:

``````00401071    xor    eax, eax                 ; resets eax
00401073    xor    eax, dword [esp]         ; copies *esp to eax
00401076    xor    dword [esp], eax         ; resets *esp
00401079    xor    dword [esp], 0x73333625  ; copies "%63s" to *esp (the format string for scanf)
00401080    xor    eax, eax                 ; resets eax
00401082    xor    eax, esp                 ; copies esp (not *esp) to eax, this saves the string's address
00401084    call   EntryPoint+137           ; saves the return address and jumps to the next instruction (changes esp)
00401089    xor    ebx, ebx                 ; resets ebx
0040108b    xor    ebx, dword [esp]         ; copies *esp to ebx
0040108e    xor    dword [esp], ebx         ; resets *esp
00401091    xor    dword [esp], edi         ; copies edi to *esp
00401094    call   EntryPoint+153           ; saves the return address and jumps to the next instruction (changes esp)
00401099    xor    ebx, ebx                 ; resets ebx
0040109b    xor    ebx, dword [esp]         ; copies *esp to ebx
0040109e    xor    dword [esp], ebx         ; resets *esp
004010a1    xor    dword [esp], eax         ; copies eax (the saved string address) to *esp
004010a4    call   dword [imp_scanf]        ; calls scanf to read input
``````

`scanf` itself will store its result (the user input) in `edi`, and we can see it being accessed immediately after the call returns:

``````004010aa xor    esp, esp            ; resets esp
004010ac xor    esp, edi            ; copies edi to esp
004010ae xor    esi, esi            ; resets esi
004010b0 xor    esi, esp            ; copies esp (pointing to user input) to esi
004010b2 xor    edi, edi            ; resets edi
004010b4 xor    edi, esi            ; copies esi to edi
004010b6 sub    esp, 0x100          ; moves esp back 0x100 bytes (remember this, it's important)
004010bc xor    ebx, ebx            ; resets ebx
004010be xor    bl, byte [esi]      ; copies the *first byte* of user input to bl (lower 8 bits of ebx)
004010c0 xor    ebx, 0x59307554     ; xors ebx with this constant
``````

The `xor ebx, 0x59307554` happens after `ebx` was set to 0 and the first byte of the password was copied to `bl`. In effect, only this first byte is XOR’d with the lowest 8 bits of the constant, or `0x54`.

This is followed by a long series of `xor` instructions, and then another `call`:

``````004010c6 xor    ecx, ecx          ; resets ecx
004010c8 xor    cl, bl            ; copies bl (our first character xor 0x54) to cl
004010ca xor    ebx, ebx          ; resets ebx (clearly we didn't need the other 3 bytes)
004010cc xor    bl, cl            ; copies cl back to bl
004010ce xor    eax, eax          ; resets eax
004010d0 xor    eax, esp          ; copies esp to eax (this is &password[0] - 0x100)
004010d2 xor    eax, ebx          ; sets al = al ^ bl (bl being password[0] ^ 0x54)
004010d4 xor    edx, edx          ; resets edx
004010d6 xor    edx, eax          ; copies eax to edx
004010d8 xor    ecx, ecx          ; resets ecx
004010da xor    ecx, dword [eax]  ; copies the value pointed by eax to ecx (dereferencing (&password[0] - 0x100) ^ password[0] ^ 0x54)
004010dc xor    dword [eax], ecx  ; clears 4 bytes at address ecx
004010de xor    eax, eax          ; resets eax
004010e0 xor    eax, esp          ; copies esp to eax (back to &password[0] - 0x100)
004010e2 xor    eax, 0xf2         ; xors al with 0xf2
004010e7 xor    eax, 0x82         ; xors the result with 0x82, so eax = (&password[0] - 0x100) ^ 0xf2 ^ 0x82
004010ec xor    ecx, ecx          ; resets ecx
004010ee xor    ecx, dword [eax]  ; copies the 4 bytes at address eax to ecx
004010f0 xor    dword [eax], ecx  ; clears 4 bytes at *eax (since ecx == *eax)
004010f2 xor    byte [eax], 0x1   ; xors the first byte at *eax with 1 (setting it to 1)
004010f5 xor    eax, eax          ; resets eax
004010f7 xor    eax, edx          ; copies edx to eax (again this is (&password[0] - 0x100) ^ password[0] ^ 0x54)
004010f9 xor    edx, edx          ; resets edx
004010fb xor    edx, dword [eax]  ; copies *eax (dereferencing the address above) to edx
004010fd xor    ebx, ebx          ; resets ebx
004010ff call   EntryPoint+260
``````

To recap, we computed two addresses:

• `(&password[0] - 0x100) ^ 0xf2 ^ 0x82` – and we set the value at that address to `1` in `0x004010f2`
• `(&password[0] - 0x100) ^ password[0] ^ 0x54` – which `edx` was first pointing to and later contained the dereferenced value at that address.

The fact that we computed one address based on constants and set a byte there before computing an other address based on the first byte of the input and making `edx` point to it seems significant.

We end with a jump to `EntryPoint+260` or `0x00401104` (once again an immediate jump to the next instruction), which has only a few instructions before it actually jumps to a different location this time:

``````00401104 xor    eax, eax                    ; resets eax
00401106 xor    eax, dword [esp]            ; puts the return address that was in *esp in eax
00401109 xor    dword [esp], eax            ; sets *esp to zero
0040110c xor    dword [esp], sub_40112d     ; copies a different return address, 0x0040112d
00401113 call   sub_4014a9                  ; calls a different function this time, not making a tiny jump
00401118 call   sub_40111d
``````

The validation function ¶

Let’s look at this function at `0x004014a9`, it’s pretty short:

``````004014a9 xor    eax, eax                    ; resets eax
004014ab xor    eax, dword [esp+arg_0]      ; copies the previously-saved return address 0x0040112d to eax
004014af sub    eax, dword [esp+ret_addr]   ; subs the return address, which is call_site+4=0x00401118 (eax=0x15)
004014b2 mul    dl                          ; multiplies eax by dl (we computed it earlier! it's either 1 or 0)
004014b4 add    eax, dword [esp+ret_addr]   ; adds 0x00401118 once again
004014b7 xor    ecx, ecx                    ; resets ecx
004014b9 xor    ecx, dword [esp+ret_addr]   ; moves the return address 0x00401118 to ecx
004014bc xor    dword [esp+ret_addr], ecx   ; erases the return address
004014bf xor    dword [esp+ret_addr], eax   ; moves the computed eax to the return address (!!)
004014c2 ret    0x4                         ; and jump to it by returning
``````

So this is our validation. Earlier, we computed an address based only on constant values and put a “1” there, then computed another based on user input and copied the value at this second address to `edx` (`dl` really). The validation function has 2 return addresses (`0x00401118` and `0x0040112d`) and doesn’t jump but returns to the first if `edx` is zero or to the second if `edx` is one.

We now know how to compute our first byte: the two addresses need to match, so `0xf2 ^ 0x82` needs to be the same as `password[0] ^ 0x54`. With XOR we know that `a ^ b == c ^ d``d == a ^ b ^ c`, or `password[0] == 0xf2 ^ 0x82 ^ 0x54`, giving us the first character: `\$`.

``````>>> hex(0xf2 ^ 0x82 ^ 0x54)
'0x24'
>>> chr(0xf2 ^ 0x82 ^ 0x54)
'\$'
``````

If the input does not match, we end up in `0x00401118` and soon copy the word “Nope” to a buffer and print it.

If it does match, we continue in `0x0040112d`, which (predictably) starts with `sub esi, -1`. Ever since the instruction at `0x004010b0`, `esi` has been pointing to our input buffer so this is moving it to the next byte.

Repeat a few more times ¶

The rest of the program works in exactly the same way: the current byte is placed into `bl`, then XOR’d with a large constant (of which only the lower 8 bits matter), then two more XOR with single bytes follow, etc. Let’s quickly go over the code for the second byte to see how the pattern repeats:

``````0040112d sub    esi, 0xffffffff     ; adds 1 to esi (subtracts -1)
00401130 xor    ebx, ebx            ; resets ebx
00401132 xor    bl, byte [esi]      ; takes the current byte of input into bl
00401134 xor    ebx, 0x68305567     ; XORs with 0x67
0040113a xor    ecx, ecx
0040113c xor    cl, bl
0040113e xor    ebx, ebx
00401140 xor    bl, cl
00401142 xor    eax, eax
00401144 xor    eax, esp            ; all of this
00401146 xor    eax, ebx            ; is much of the same
00401148 xor    edx, edx            ; as earlier
0040114a xor    edx, eax
0040114c xor    ecx, ecx
0040114e xor    ecx, dword [eax]
00401150 xor    dword [eax], ecx
00401152 xor    eax, eax
00401154 xor    eax, esp
00401156 xor    eax, 0xba           ; XORs again with 0xba
0040115b xor    eax, 0xa8           ; and 0xa8
``````

So our second byte is logically `0x67 ^ 0xba ^ 0xa8` or `u` (`0x75`).

By simply going over the references to the validation function at `0x004014a9`, we can find all of these similar code blocks each adding one to `esi`, XOR’ing the current input byte with the lower 8 bits of a long constant, then XOR’ing twice more with two other byte constants.

The callers are at the following addresses:

1. `0x00401113`
2. `0x00401185`
3. `0x004011f7`
4. `0x00401265`
5. `0x004012d3`
6. `0x00401345`
7. `0x004013b7`

Every single one of these callers has a structure that’s extremely similar to the block we’ve just looked at. They all do something like:

``````xor    bl, byte [esi]      ; takes the current byte of input into bl
xor    ebx, 0x12345678     ; XORs with a large constant, only the lower 8 bits matter
; ...
xor    eax, 0xab           ; XORs again with one 8-bit constant
xor    eax, 0xcd           ; and one more 8-bit constant
``````

Just before setting `bl` to `byte [esi]`, we also set all the other bits of `ebx` to zero with a `xor ebx, ebx`. So `ebx` has only our current character, stored in its lower 8 bits. If you need a reminder, `bl` is an 8-bit register containing the lowest 8 bits of `bx`:

``````◄──────────────────────────── 64 bits of rbx ─────────────────────────────►
◄────────── 32 bits of ebx ──────────►
◄─ 16 bits of bx ─►
┌────────────────────────────────────┬──────────────────┬────────┬────────┐
│                                    │                  │   bh   │   bl   │
└────────────────────────────────────┴──────────────────┴────────┴────────┘
◄─8bits─► ◄─8bits─►
``````

Each caller works the same way as the block we just looked at, with the byte at address `esi` being moved to `bl`, then `ebx` being XOR’d with a large constant – since we only have the `bl` bits set, we only care about the lower 8 bits of this constant – and then `eax` is XOR’d twice more with two 8-bit constants.

This is why XOR’ing `ebx` with a long constant like `0x12345678` in our example only actually XORs our current input byte with the lowest 8 bits of the constant, `0x78`. If this is followed by `xor eax, 0xab` and `xor eax, 0xcd`, then our final input byte is `0x78 ^ 0xab ^ 0xcd`.

That’s all we need to decode each byte. So far we’ve gone through the first two:

Caller addressLong constantFirst 8-bit constantSecond 8-bit constantResulting character
`0x00401113``0x59307554``0xf2``0x82``0x54 ^ 0xf2 ^ 0x82` = `0x24`
`0x00401185``0x68305567``0xba``0xa8``0x67 ^ 0xba ^ 0xa8` = `0x75`

The other five can be found in the exact same way.

Putting it all together, we get a success message if we enter the correct password:

``````> xormadness.exe
I found all these `xor` instructions to be pretty confusing at first, but after an hour or so spent reading the same 2 or 3 sequences I started to see them as `mov` where they were used for this purpose. The call stack manipulation was more tricky, probably because it’s relatively unusual to see code in a `call` change its return address outside of exploits.