Crackme 18: BinaryNewbie’s cr4ckm3_0x

Seeing how I thought the previous crackme by BinaryNewbie was pretty interesting, I figured I would give a shot to this other challenge from the same author. Once again the original archive is hosted on crackmes.one, but you can also download the binary here.

The README mentions two “helpful links”: Numbermatics (a website about the properties of various numbers), and a SHA-256 hashing tool. There might be some math involved.

The program asks for a single password, which has some length constraints:

$ ./cr4ckm3_0x
[+] Welcome to cr4ckm3_0x
[+] Try to get the correct password
[+] I hope that will be funny
[+] Developed by Binary Newbie

Enter the password: hello
[-] Wrong length

Entering a longer string produces a 256-bit hash:

$ ./cr4ckm3_0x
[+] Welcome to cr4ckm3_0x
[+] Try to get the correct password
[+] I hope that will be funny
[+] Developed by Binary Newbie

Enter the password: hello-world
Here is your hash: 01d3393d50682838c7a0567aa4053766c0d5b108229a6d8b7a1c367d9d9e0e31
[-] Oh no, try again :/

Starting the analysis ¶

Before disassembling, let’s quickly examine the binary: strace doesn’t reveal any suspicious calls like ptrace or fork, and readelf -a produces a long output with a lot of metadata, which might be helpful.

On the other hand, strings prints out almost 6,000 lines, and most seem completely unrelated to this program. I suspect this is due to the static linking of glibc.

Disassembling the program, we easily find the welcome strings in sub_4005d0 with what seems like two puts calls followed by a read of some kind (fgets, maybe?). The function allocates a pretty large buffer on the stack, and points rbx to rbp-0x9a before reading what seems to be 10 bytes. We also see the code setting the byte at rbp-0x91 to zero:

00000000004005d9 lea    rdi, qword [aWelcomeToCr4ck] ; "[+] Welcome to cr4ckm3_0x..."
; (some instructions skipped)
00000000004005ef lea    rbx, qword [rbp-0x9a]        ; note where rbx points
00000000004005f6 sub    rsp, 0x90
00000000004005fd mov    rax, qword [fs:0x28]
0000000000400606 mov    qword [rbp-0x28], rax
000000000040060a xor    eax, eax
000000000040060c call   sub_417e00+384               ; presumably puts or equivalent
0000000000400611 lea    rsi, qword [aEnterThePasswo] ; "Enter the password: "
0000000000400618 mov    edi, 0x1
000000000040061d xor    eax, eax
000000000040061f call   sub_453bd0+416               ; also a write
0000000000400624 mov    rdx, qword [qword_6c17a8]
000000000040062b mov    esi, 0xa                     ; 10 bytes
0000000000400630 mov    rdi, rbx                     ; this is the actual buffer parameter
0000000000400633 call   sub_417870                   ; fgets or equivalent
0000000000400638 test   rax, rax
000000000040063b je     loc_400789                   ; jumps further down

0000000000400641 mov    byte [rbp-0x91], 0x0         ; this is rbx + 9 bytes

so something like:

char buffer[10];
// likely a few more variables on the stack
puts("[+] Welcome...");
puts("Enter the password");
if (fgets(buffer, 10, stdin) != NULL) {
    buffer[9] = 0;
}

It looks like our password will be 9 characters long.

Finding constraints to limit our search space ¶

Following the fgets call, a loop goes over each character with different individual conditions for various bytes, ultimately all checking that none of the input characters are null. We start with the input string in rax, and the first block makes rdx point to rax+1 and checks the byte at rdx-1 – the first one in the string.

The conditions are easy to follow, even if this sort of unrolled loop is a bit verbose:

We then have a second length check, and the actual input validation starts. Recall that rbp - 9a points to the first character of our input: the first condition is whether it is equal to the character '2'.

00000000004006ba cmp    byte [rbp-0x9a], '2'
00000000004006c1 je     loc_40079c            ; continue to next condition

Then comes a sequence of 6 function calls, after which half of them check the return value in al and interrupt the sequence if it is not zero. This is pretty clear when we look at the control flow graph and highlight the common jumps that all break the sequence:

Note also how the byte at rbp - 0x99 (which is our second input second character) is compared to 0x70 and we branch out if the difference is strictly greater than 3. This leaves only the range 0x70 to 0x73 as valid characters, which are the letters p, q, r, s.

Indeed, a single encouragement message is printed if we try the password 2AAAAAAAA:

Enter the password: 2AAAAAAAA
[+] Yes, keep going :)
Here is your hash: a87e9cefa585405929d6c5277ae827e7976ed73f15b13ba4a92d3ce16692b0e2
[-] Oh no, try again :/

But if the second letter is in the range we just found, a second message appears:

Enter the password: 2pAAAAAAA
[+] Yes, keep going :)
[+] Yes, keep going :)
Here is your hash: 70442c43c67b9da1de7be700206027c5b412916bfe11fd6e58a884149f4fd80a
[-] Oh no, try again :/

Since the two function calls made after the checks on 2 and 0x70 both went to sub_400f30, we can probably assume that this procedure only prints the progress message. A quick look reveals a couple of suspicious constants:

0000000000400f39 movabs rdx, 0xd1d99ecedbdbd59e         ; this looks like it might be a string
0000000000400f43 mov    ecx, 0xffff9784                 ; maybe this one too
0000000000400f48 mov    r11d, 0xffffffbe                ; /!\ note the last byte of r11: 0xbe
                                                        ; (a bunch of other instructions omitted)
0000000000400f64 mov    dword [rbp-0x20], 0x9ed9d0d7    ; this one also looks unusual
                                                        ; (some more instructions skipped)
0000000000400f7a movabs rax, 0x92cddbe79ee395e5         ; and more magic numbers

Shortly after these constants are installed into various registers, we jump straight into the core of a loop that XORs bytes with r11b (the low 8 bits of r11):

0000000000400f97 jmp    loc_400ffb
                                            ; instructions in-between omitted)
0000000000400ffb xor    byte [r8], r11b     ; remember that we've just set this to 0xbe

Predictably, the decryption loop ends with a call to puts.

Let’s decrypt the values we found, after re-ordering them and converting from little-endian:

>>> ''.join(chr(b ^ 0xbe) for b in [0x92, 0xcd, 0xdb, 0xe7, 0x9e, 0xe3, 0x95, 0xe5][::-1] +
...   [0xd1, 0xd9, 0x9e, 0xce, 0xdb, 0xdb, 0xd5, 0x9e][::-1] +
...   [0x9e, 0xd9, 0xd0, 0xd7][::-1] +
...   [0x97, 0x84][::-1])
'[+] Yes, keep going :)'

Perfect! So sub_400f30 is indeed what prints this message.

The next call goes to sub_4016a0, preceded by a mov rdi, rbx which makes rdi point to our input. This function looks somewhat similar to sub_400f30 with what looks like a string being decoded, but not before a range comparison validates the byte at rdi + 2 (the third character of our input):

00000000004016b7 movzx  edx, byte [rdi+2]   ; make edx point to input + 2
                                            ; (some code omitted)
00000000004016ca cmp    dl, 0x6a            ; this is 'j'
00000000004016cd je     loc_401708

00000000004016cf cmp    dl, 0x6b            ; 'k'
00000000004016d2 je     loc_401708

00000000004016d4 cmp    dl, 0x6c            ; 'l'
00000000004016d7 je     loc_401708

00000000004016d9 cmp    dl, 0x6d            ; 'm'
00000000004016dc je     loc_401708

00000000004016de cmp    dl, 0x6e            ; 'n'
00000000004016e1 je     loc_401708

Any of these characters makes us jump to loc_401708, which has the same constants as in the “keep going” function. With this range of valid values for the third input character, we can use it to get a third progress message to print:

Enter the password: 2pkAAAAAA
[+] Yes, keep going :)
[+] Yes, keep going :)
[+] Yes, keep going :)
Here is your hash: 119a7d0e8372c3cda43a2619f7ae6e5a4b6aaeedb83869d694480728f5ea5fa8
[-] Oh no, try again :/

Let’s continue with sub_4017d0. Interestingly, we see a call to __ctype_tolower_loc followed by a comparison of the entry for the character at input[3] with the value 8, which is something we explored recently in crackme 16.

00000000004017fa call   __ctype_tolower_loc         ; the function used for isascii/isdigit/etc
00000000004017ff movsx  rcx, byte [rbx+0x3]         ; move input[3] into rcx
0000000000401804 mov    rdx, qword [rax]            ; rax points to the ctype table
0000000000401807 xor    eax, eax                    ; unused here
0000000000401809 test   byte [rdx+rcx*2+0x1], 0x8   ; compare the table entry for input[3] with 8
000000000040180e jne    loc_401830

This comparison validates the third character with isdigit. As with the other functions, this is followed by code to print a progress message. We now have a constraint for our fourth byte, so let’s again validate our progress with one more valid character:

Enter the password: 2pk7AAAAA
[+] Yes, keep going :)
[+] Yes, keep going :)
[+] Yes, keep going :)
[+] Yes, keep going :)
Here is your hash: a9980a331d6f6c7bd6c9824b3b92f3b3534a70996afb7e870c88d4b73feb638e
[-] Oh no, try again :/

The next call goes to sub_401900, which starts by looking at characters 4 and 5:

0000000000401917 movzx  ecx, byte [rdi+4]       ; move input[4] to ecx (cl, really)
                                                ; [...] (3 irrelevant instructions skipped)
000000000040192a cmp    cl, byte [rdi+5]        ; and we compare it to input[5]
000000000040192d je     loc_401950              ; jump off without the next comparisons

We can see from the control flow graph that the next success message depends at least on these two matching, but there’s a bit more:

0000000000401950 movsx  si, cl                  ; move input[4] to si
0000000000401954 mov    r8d, ecx                ; and to r8
0000000000401957 lea    edx, dword [rsi*8]      ; edx = input[4] * 8
000000000040195e sar    r8b, 0x7                ; shift right lowest 8 bits of r8 by 7
0000000000401962 sub    edx, esi                ; edx = edx - esi, or edx = input[4] * 7
0000000000401964 lea    edi, dword [rsi+rdx*8]  ; edi = rsi + rdx * 8, or edi = input[4] + (input[4] * 7) * 8
0000000000401967 sar    di, 0x9                 ; shift right by 9
000000000040196b sub    edi, r8d                ; edi = edi - 16 bits or r8
000000000040196e lea    r9d, dword [rdi+rdi*8]  ; e9 = rdi * 9
0000000000401972 cmp    cl, r9b                 ; check if cl matches this value
0000000000401975 jne    loc_40192f              ; jump *over* the success message

Decoding this sort of thing is always a bit tedious, but we can simplify it a little. The sar r8b, 0x7 will likely set it to zero, since most printable ASCII characters are in the 0-127 range. We can therefore remove all the r8 references, and after a few simple β-reductions we end up with the rule ((input[4] + (input[4] * 7) * 8) >> 9) * 9 == input[4].

There are several values that could satisfy this constraint, but we might be able to reduce the search space further if we take the size of each register into consideration. Ghidra applies these reduction rules automatically and does an excellent job of simplifying this code, ending up with a trivial condition:

char input_4 = input[4];
if (input_4 == input[5]) {
  if ((input_4 == (char)((input_4 / '\t') * '\t')) && ((double)(int)(input_4 / '\t') == 5.0)) {

Since '\t' is 9, we’re doing input[4] % 9 == 0 && input[4] / 9 == 5. The only matching value is 45, which is the hyphen-minus character -.

Once again, let’s validate our progress:

Enter the password: 2pk7--AAA
[+] Yes, keep going :)
[+] Yes, keep going :)
[+] Yes, keep going :)
[+] Yes, keep going :)
[+] Yes, keep going :)
Here is your hash: 8a4fa0f6d51d678fbe4807d3208db9e633585a4a73308210b92293f9f4362442
[-] Oh no, try again :/

We now have six out of the nine characters. It feels like we’re getting close, but we still haven’t explored the SHA-256 hashing part.

The next (and last!) function call is to sub_401a50, which is a bit more complex than the ones we’ve seen so far. It starts by resetting errno, and saves input[6], input[7], and input[8] into r13, r14, and r15:

0000000000401a6c mov    rbx, rdi                ; make rbx point to our input, which we had in rdi so far
; skipping a few instructions...
0000000000401a82 call   sub_4084d0              ; this is actually calling __errno_location()
0000000000401a87 mov    dword [rax], 0x0        ; and setting errno to zero
0000000000401a8d movsx  r13, byte [rbx+6]       ; input[6]
0000000000401a92 mov    r12, rax
0000000000401a95 movsx  r14, byte [rbx+7]       ; input[7]
0000000000401a9a movsx  r15, byte [rbx+8]       ; input[8]

It’s interesting to note the call to __errno_location(). Even though the function was not recognized by any of the tools I used for this problem, its contents give it away:

sub_4084d0:
00000000004084d0 mov    rax, 0xffffffffffffffc0 ; sets rax to -64
00000000004084d7 add    rax, qword [fs:0x0]     ; offset it by FS
00000000004084e0 ret

The FS segment is generally used for thread-local storage, so this is returning a pointer to a thread-local variable and then setting it to zero at 0x401a87 with mov dword [rax], 0x0. This could of course be any other thread-local variable, but considering the functions called below it is highly likely to be the address of errno.

Much like for input[3] above, the function calls __ctype_b_loc to validate that the last three characters are numeric:

0000000000401aaf call   sub_408650+80               ; this is __ctype_b_loc
0000000000401ab4 mov    rax, qword [rax]            ; rax points to the table
0000000000401ab7 test   byte [rax+r13*2+1], 0x8     ; performs table[r13] & 8
0000000000401abd je     loc_401cbc                  ; jumps out if equal to zero, so not matching

0000000000401ac3 test   byte [rax+r14*2+1], 0x8     ; same for input[7]
0000000000401ac9 je     loc_401cbc

0000000000401acf test   byte [rax+r15*2+1], 0x8     ; same for input[8]
0000000000401ad5 je     loc_401cbc

We then call the unknown function sub_4159f0 with input+6, NULL, and 10 as parameters. This is immediately followed by an error check that prints a message if strtol failed, which pretty much gives it away:

0000000000401adb lea    rdi, qword [rbx+6]      ; input + 6
0000000000401adf xor    esi, esi                ; null
0000000000401ae1 mov    edx, 0xa                ; 10
0000000000401ae6 call   sub_4159f0              ; strtol(const char *nptr, char **endptr, int base)
0000000000401aeb mov    ecx, dword [r12]        ; maybe errno?
0000000000401aef mov    rbx, rax                ; presumably the return value is in rax
0000000000401af2 test   ecx, ecx                ; if non-zero (failed)
0000000000401af4 jne    loc_401e28              ; jump (see below)
;
; (a large block omitted to show where we'd land with this jump)
;
                 loc_401e28:
0000000000401e28 lea    rdi, qword [aStrtolErr] ; "strtol err: "
0000000000401e2f call   sub_416ab0+352          ; perror

A long list of numeric comparisons come after the call to strtol. The program seems to be structured to make the validation intentionally complex, as the control flow graph demonstrates:

Yes, there’s a lot more. At this point I switched to Ghidra, since it was the tool that seemed to produce the cleanest output for this massive function. After cleaning it up as much as I could, I ended up with the following validation code in Python:

def validate(n):
    x = 2
    y = 1
    calc_and_7 = ((n + 1) - 2) & 7
    if calc_and_7 == 0: return False
    if calc_and_7 != 1:
      if calc_and_7 != 2:
        if calc_and_7 != 3:
          if calc_and_7 != 4:
            if calc_and_7 != 5:
              if calc_and_7 != 6:
                  x = 3
                  if n % 2 == 0: y = 3
              if n % x == 0: y = y + x
              x = x + 1
            if n % x == 0: y = y + x
            x = x + 1
          if n % x == 0: y = y + x
          x = x + 1
        if n % x == 0: y = y + x
        x = x + 1
      if n % x == 0: y = y + x
      x = x + 1
    if n % x == 0: y = y + x
    x = x + 1
    while n + 1 != x:
      z = y + x
      if n % x != 0: z = y
      if n % (x + 1) == 0: z = x + 1 + z
      y = x + 2 + z
      if n % (x + 2) != 0: y = z
      if n % (x + 3) == 0: y = x + 3 + y
      z = y + x + 4
      if n % (x + 4) != 0: z = y
      if n % (x + 5) == 0: z = x + 5 + z
      y = x + 6 + z
      if n % (x + 6) != 0: y = z
      if n % (x + 7) == 0: y = x + 7 + y
      x = x + 8

    return y == 0x32a

for i in range(100, 1000): # we know it's 3 digits
  if validate(i):
    print('FOUND:', i)

Running this code produces two solutions, 424 and 538. But not so fast! sub_401a50 doesn’t actually end there, we see it swapping input[6] and input[8] before calling strtol again:

0000000000401ce8 movzx  r8d, byte [rbp-0x52]    ; r8 = input[8]
0000000000401ced movzx  r11d, byte [rbp-0x54]   ; r11 = input[6]
0000000000401cf2 lea    rdi, qword [rbp-0x54]
0000000000401cf6 xor    esi, esi
0000000000401cf8 mov    edx, 0xa
0000000000401cfd mov    byte [rbp-0x52], r11b   ; input[8] = r11
0000000000401d01 mov    byte [rbp-0x54], r8b    ; input[6] = r8
0000000000401d05 call   strtol

Finally, it compares the result of the first strtol with the second and jumps over the progress message if they don’t match. This means that the last 3 characters have to be a palindrome, so only 424 would be valid.

Validating the hash ¶

The sequence of validation functions that we analyzed offers a few branching points for the program to exit early. But if we make it through all of them, a hash is printed and compared to a long hex string:

0000000000400705 lea        rsi, qword [aHereIsYourHash]    ; "Here is your hash: %s\\n"
000000000040070c mov        rdx, r12
000000000040070f mov        edi, 0x1
0000000000400714 xor        eax, eax
0000000000400716 call       sub_453bd0+41                   ; maybe prints our hash?
000000000040071b lea        rdi, qword [aD867f9f78c62e0]    ; constant hash
0000000000400722 mov        edx, 0x40                       ; 64 bytes, the length of a SHA-256 hash in hex
0000000000400727 mov        rsi, r12
000000000040072a call       sub_400478                      ; comparison?
000000000040072f test       eax, eax                        ; the return value is checked
0000000000400731 jne        loc_400780                      ; with a jump

The string referenced at 0x40071b looks like a SHA-256 hash written in hex and lowercased: "d867f9f78c62e060f741670ae2cdb4873f1645a33f13347f0bb240bc4d36f9a2". With tight constraints on all the characters, let’s see if we can generate a password that also matches this constant:

import hashlib

def find_password():
  # c0 is '2'
  for c1 in map(chr, [0x70, 0x71, 0x72, 0x73]):
    for c2 in map(chr, [0x6a, 0x6b, 0x6c, 0x6d, 0x6e]):
      for c3 in map(chr, range(ord('0'), 1 + ord('9'))):
        # c4 and c5 are both '-'
        # c6, c7, c8 are '424'
        password = '2' + c1 + c2 + c3 + '--' + '424'
        hashed = hashlib.sha256(password.encode('utf-8')).hexdigest()
        if hashed == 'd867f9f78c62e060f741670ae2cdb4873f1645a33f13347f0bb240bc4d36f9a2':
            print('password found: ' + password)

And at last, we have the full password:

$ ./validate.py
password found: (redacted)

Impressions ¶

I enjoyed this crackme, it had some challenging parts without being impossible to solve. It felt good to recognize the use of isdigit that I had recently learned about, and to apply these lessons to solve this second one more quickly than I expected.

Once again Ghidra did not disappoint, and its superior decompilation abilities were very helpful. I recently bought The Ghidra Book, and am looking forward to becoming more proficient with this tool.