scufFed

b01lers CTF 2026

8804 words | 42 minutes

Yeah, so it’s been a while since I’ve played a CTF, huh? I’ve mostly be caught up organising Singapore’s National Cybersecurity Olympiad and getting really depressed looking at Claire Code one-shotting revs on the Twitter feed. Despite everything, we cringe on.

During this CTF, I was a little too busy getting my shit rocked at the Weatherday gig to compete during the full duration, so I decided to focus my efforts on one of the two low-solve rev challs, and it turned out to be really fun!

In this writeup, I invite you to follow me in the solve process. This isn’t going to be a very clean and structured writeup as I really want to capture the messy, back and forth nature of tackling a challenge like this.

This challenge was solved with 0% Artificial Intelligence and 100% Human Stupidity.

Piano (38 solves / 437 points)

by: oh_word

Description: A sky full of stars

First Blush

Stop saying Ghidra sucks because it doesn’t instantly resolve syscalls. Ghidra has a pre-bundled script called ResolveX86orX64LinuxSyscallsScript.java that does that for you. I know IDA does it automatically, but this isn’t a massive hurdle, guys. Anyways, we run that script to clean up the manual syscalls and get a better idea of what we’re looking at.

The first block is just an argc and strlen check (strlen is reimplemented in FUN_00834300). We know this because we have eyes:

  if (param_1 == 2) {
    puVar10 = (undefined1 *)param_2[1];
    lVar7 = FUN_00834300(puVar10);
    if (lVar7 != 0x24) {
LAB_00401156:
      write(1,"Wrong\n",6);
      puVar9 = puVar9 + 1;
      goto LAB_0040111e;
    }

The actually interesting bit comes immediately after.

    local_148 = 0x11;
    unaff_RBX = syscall_000001B3(local_168,0x58);
    puVar9 = local_2b8 + 1;
    __options = 0;
    if (unaff_RBX != 0) goto LAB_00402888;
    lVar7 = ptrace(PTRACE_TRACEME,0,0);
    puVar9 = local_2b8 + 2;
    if (lVar7 != 0) goto LAB_00401156;

A quick consult of the Linux syscall table tells us syscall_000001B3 is a call to clone3, which the kernel describes as “the extensible successor to clone()/clone2()”. Okay, fancypants. This is a fork(). So we have a parent-child tango going on here, sure. The passthrough right after this is the child’s logic (with the ptrace(PTRACE_TRACEME,0,0))), and the parent’s logic lies in subroutine 0x402888. And we see… more ptrace nonsense…

LAB_00402888:
  _Var2 = (__pid_t)unaff_RBX;
  wait4(_Var2,&local_1a4,__options,(rusage *)0x0);
  ptrace(PTRACE_CONT,unaff_RBX);
  puVar9 = puVar9 + 2;
  while( true ) {
    while( true ) {
      do {
        puVar13 = (undefined1 *)puVar9;
        wait4(_Var2,&local_1a4,0,(rusage *)0x0);
        bVar12 = (byte)(local_1a4 >> 8);
        if ((local_1a4 & 0x7f) == 0) {
                    /* WARNING: Subroutine does not return */
          exit((uint)bVar12);
        }
        puVar9 = (undefined8 *)(puVar13 + 8);
      } while ((short)((local_1a4 & 0xffff) * 0x10001 >> 8) < 0x7f01);
      if (bVar12 != 7) break;
      local_1b0 = &local_108;
      ptrace(PTRACE_GETREGS,unaff_RBX,0,local_1b0,0x3d,0x65);
      ptrace(PTRACE_PEEKTEXT,unaff_RBX,local_88);
...

Nanomites, Son!

So we’ve got nanomites. Here’s the source for my claim. For the uninitiated, this is a nifty anti-debugging technique where you have a strict parent-child relationship: the parent traces the child, and when the child triggers some signal (e.g. hitting a SIGTRAP with an int 3), it hands off execution back to the parent, applies some logic, then returns execution to the child.

How this behaves as effective anti-debugging (besides the fact that dealing with sighandlers are annoying) is that processes can only have one tracer at a time, which means:

For now, we’ll stick with debugging the parent using gdb and statically analysing the child. We’ve got more logic to look through first, anyways. We’ll tackle this in a better way later.

Signs & Signals

So what signals are we actually catching on the parent’s side?

The relevant code block is here:

  do {
    puVar13 = (undefined1 *)puVar9;
    wait4(_Var2,&local_1a4,0,(rusage *)0x0);
    bVar12 = (byte)(local_1a4 >> 8);
    if ((local_1a4 & 0x7f) == 0) {
                /* WARNING: Subroutine does not return */
      exit((uint)bVar12);
    }
    puVar9 = (undefined8 *)(puVar13 + 8);
  } while ((short)((local_1a4 & 0xffff) * 0x10001 >> 8) < 0x7f01);
  if (bVar12 != 7) break;
...
      if (bVar12 == 0xb) break;
    ptrace(PTRACE_CONT,unaff_RBX,0,(long)(int)(uint)bVar12,0x3d,0x65);
    puVar9 = (undefined8 *)(puVar13 + 0x10);
  }

Looking at the parameters of wait4, we see that local_1a4 is wstatus, which has its status bits defined in waitstatus.h. We can see that our little computation bVar12 = (byte)(local_1a4 >> 8) should extract us the signal number (if local_1a4 & 0x7f is 0, that means the child has exited), and we check whether it’s equal to 0x7 or 0xb, which based on the signal table tells us we’re catching either a SIGBUS or a SIGSEGV.

SIGBUS is one of the less common signals to hit, usually a result of trying to access misaligned addresses or trying to access a valid virtual address that has no physical backing memory (this is an interesting point that we will inspect further later!).

After we catch and identify that we’ve struck a SIGBUS, we continue execution and call these two ptrace functions:

ptrace(PTRACE_GETREGS,unaff_RBX,0,local_1b0,0x3d,0x65);
ptrace(PTRACE_PEEKTEXT,unaff_RBX,local_88);

This is very traditional nanomites behaviour. PTRACE_GETREGS will fetch us the current registers of the child in a gigantic struct:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x40292e                  mov    r10, rax
     0x402931                  mov    rax, r9
●    0x402934                  syscall 
 →   0x402936                  mov    edi, 0x1
     0x40293b                  mov    rdx, QWORD PTR [rbp-0x80]
     0x40293f                  lea    r10, [rbp-0x198]
     0x402946                  mov    rax, r9
     0x402949                  syscall
     0x40294b                  lea    rsi, [rbp-0x190]
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "chal", stopped 0x402936 in ?? (), reason: SINGLE STEP
───────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x402936 → mov edi, 0x1
[#1] 0x83272f → mov edi, eax
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  x/20gx 0x00007fffffffdd90
0x7fffffffdd90:	0x00007ffff7ff5000	0xffffffffffffffff
0x7fffffffdda0:	0x00007ffff7ff40c0	0x00007fffffffe20b
0x7fffffffddb0:	0x00007fffffffde90	0x00007ffff7ff40b0
0x7fffffffddc0:	0x00000000000001ec	0x00007ffff7ff440c
0x7fffffffddd0:	0x0000000000000000	0x00007ffff7ff4280
0x7fffffffdde0:	0x0000000000000061	0x0000000000000080
0x7fffffffddf0:	0x0000000000000004	0x0000000000000062
0x7fffffffde00:	0x00007ffff7ff4040	0xffffffffffffffff
0x7fffffffde10:	0x0000000000403718	0x0000000000000033
0x7fffffffde20:	0x0000000000010246	0x00007fffffffdbd8
gef➤  

and PTRACE_PEEKTEXT gets us the 8 bytes at the address pointed to by local_88, which in this case is [rbp-0x80] == 0x7fffffffde10, which is the value of rip from the child:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x40293f                  lea    r10, [rbp-0x198]
     0x402946                  mov    rax, r9
●    0x402949                  syscall 
 →   0x40294b                  lea    rsi, [rbp-0x190]
     0x402952                  mov    rdi, r10
     0x402955                  mov    QWORD PTR [rbp-0x1b8], r10
     0x40295c                  call   0x402c60
     0x402961                  mov    rdi, QWORD PTR [rbp-0x1b8]
     0x402968                  mov    DWORD PTR [rbp-0x1b0], eax
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "chal", stopped 0x40294b in ?? (), reason: SINGLE STEP
───────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x40294b → lea rsi, [rbp-0x190]
[#1] 0x83272f → mov edi, eax
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  x/gx $r10
0x7fffffffdcf8:	0x45663667b60f4566
gef➤  

We can see that this corresponds with the bytes at 0x403718 in the binary:

...
    LAB_00403718
    00403718 66 45 0f        MOVZX      R12W,byte ptr [R15 + 0x36]
              b6 67 36
...

This also tells us that our child first triggers a SIGBUS on this instruction. Interesting! Anyways, we cringe on with our parent analysis. After fetching the current instruction from the child, we pass that into FUN_00402c60:

...
    local_1c0 = local_1a0;
    *(undefined8 *)(puVar13 + 0x10) = 0x402961;
    uVar3 = FUN_00402c60(local_1a0,local_198);
    puVar10 = local_1c0;
    local_1b8 = CONCAT44(local_1b8._4_4_,uVar3);
    *(undefined8 *)(puVar13 + 0x10) = 0x402973;
...

This is a really fucking big function (about 480 LOC in the decompilation) and it’s not immediately clear what it’s doing. Now that we’re doing this writeup, I can confidently say that it’s an instruction parser, with the numerous if/switch statements in place to determine opcode parameters. However, note the following:

...
        00402955 4c 89 95        MOV        qword ptr [RBP + local_1c0],R10
                 48 fe ff ff
        0040295c e8 ff 02        CALL       FUN_00402c60                                     
                 00 00
        00402961 48 8b bd        MOV        RDI,qword ptr [RBP + local_1c0]
                 48 fe ff ff
        00402968 89 85 50        MOV        dword ptr [RBP + local_1b8],EAX
                 fe ff ff
        0040296e e8 8d 02        CALL       FUN_00402c00                                     
                 00 00
...

In the actual program, we only concern ourselves with the final return value of FUN_00402c60, which we can sort of eyeball the computation of at the very end of the function:

...
  bVar4 = (char)pbVar17 - (char)param_1;
  (*param_2)[0] = bVar4;
joined_r0x0040300b:
  if (0xf < bVar4) {
    *(uint *)(param_2[2] + 1) = *(uint *)(param_2[2] + 1) | 0x5000;
    (*param_2)[0] = 0xf;
    return 0xf;
  }
  return bVar4;
}

pbVar17 is the working variable for most of the function, and through a bit of guesswork, we can determine that the return value of this function is the length of the instruction parsed. At least for the instruction at 0x403718, we do get the value 0x6 out of this function, so… let’s just wing it?

We then use this length value and the actual instruction bytes (again from [rbp-0x80]) and pass it into FUN_00402c00, which is a lot easier to trace.

A Cool Antidecompilation Trick

Let’s look at the decompilation of FUN_00402c00.

      undefined FUN_00402c00()
      FUN_00402c00                                    
                                                                                  
      PUSH       RBX
      PUSH       RAX
      XOR        RBX,RBX
      XOR        RBX,0x1
      CALL       LAB_00402c0e

      LAB_00402c0e

      XOR        RBX,0x1
      CMP        RBX,0x1
      JZ         LAB_00402c3a
      MOVZX      EAX,byte ptr [RDI + 0x1]
      SHR        AL,0x2
      AND        AL,0x1
      TEST       AL,AL
      PUSH       RBX
      LEA        RAX,[FUN_004037d0]
      LEA        RBX,[FUN_00403800]
      CMOVZ      RAX=>FUN_00403800,RBX
      POP        RBX
      RET

      LAB_00402c3a

      POP        RSI
      MOVZX      EAX,byte ptr [RDI]
      XOR        EDX,EDX
      DIV        ESI
      MOVZX      EAX,byte ptr [RDI + RDX*0x1]
      TEST       AL,0x4
      LEA        RAX,[FUN_00403770]
      LEA        RBX,[FUN_004037a0]
      CMOVZ      RAX=>FUN_004037a0,RBX
      POP        RBX
      RET

I’m deliberately avoiding using the decompilation because the finer logic gets lost in it. Knowing that rdi holds our instruction bytes and rsi holds the instruction length, we can see some interesting logic here.

It seems like rbx is here to break the decompilation. The use of call here is very deliberate: the subroutine LAB_00402c0e ends with a ret, which the decompiler believes is the end of the function, and the cmp would never clear, so the jump to LAB_00402c3a doesn’t strike.

To help follow along, we can annotate the value of rbx:

        undefined FUN_00402c00()
        FUN_00402c00                                    
                                                                                   
        PUSH       RBX
        PUSH       RAX
        XOR        RBX,RBX ; RBX = 0
        XOR        RBX,0x1 ; RBX = 1
        CALL       LAB_00402c0e

        LAB_00402c0e

        XOR        RBX,0x1 ; RBX = 0
        CMP        RBX,0x1 ; RBX = 0, CMP FAILS
        JZ         LAB_00402c3a

This makes the decompilation end prematurely:

/* WARNING: Removing unreachable block (ram,0x00402c3a) */
/* WARNING: Removing unreachable block (ram,0x00402c58) */
/* WARNING: Removing unreachable block (ram,0x00402c5c) */

code * FUN_00402c00(long param_1)

{
  code *pcVar1;
  
  pcVar1 = FUN_004037d0;
  if ((*(byte *)(param_1 + 1) >> 2 & 1) == 0) {
    pcVar1 = FUN_00403800;
  }
  return pcVar1;
}

Note the Ghidra comments on removing 0x402c3a, which it treats as unreachable!

However, since we call the subroutine, once it completes, it will return to the xor rbx, 0x1, clearing it out and making the jump pass!

LAB_00402c0e

        XOR        RBX,0x1 ; now RBX = 1 on the rerun
        CMP        RBX,0x1 ; RBX = 1, CMP PASSES!
        JZ         LAB_00402c3a

This is really, really nifty!

The first block is actually there to just fuck with us; both FUN_004037d0 and FUN_00403800 will never be called, and they deceptively look like win/loss conditions:

void FUN_004037d0(int param_1)
{
  if (param_1 != 0xa539f) {
    return;
  }
  write(1,"Correct!\n",9);
                    /* WARNING: Subroutine does not return */
  exit(0);
}


void FUN_00403800(void)
{
  return;
}

How sneaky. Well now we know subroutine LAB_00402c3a is where the actual logic lies, where tracing the assembly looks like such:

The two addresses that can be loaded into rax are both short functions:

void FUN_00403770(undefined8 *param_1)
{
  code *pcVar1;
  undefined8 *puVar2;
  
  puVar2 = (undefined8 *)thunk_FUN_00833cd0(8);
  *puVar2 = *param_1;
  *param_1 = puVar2;
  syscall_0000000F();
                    /* WARNING: Does not return */
  pcVar1 = (code *)invalidInstructionException();
  (*pcVar1)();
}

void FUN_004037a0(undefined8 *param_1)
{
  undefined8 *puVar1;
  code *pcVar2;
  
  puVar1 = (undefined8 *)*param_1;
  *param_1 = *puVar1;
  thunk_FUN_00832d30(puVar1);
  syscall_0000000F();
                    /* WARNING: Does not return */
  pcVar2 = (code *)invalidInstructionException();
  (*pcVar2)();
}

Hang on, is that syscall(0xf)? Isn’t that rt_sigreturn of SROP fame? Huh? Let’s put a pin on that… let’s wrap up the parent logic first.

After calling FUN_00402c00, we then seem to set up our register struct to restore the child’s state, indicative by the PTRACE_SETREGS call:

    puVar1 = DAT_00837248;
    puVar9 = DAT_00837248 + 1;
    *DAT_00837248 = 0;
    puVar1[0x3f] = 0;
    uVar11 = (ulong)(((int)puVar1 - (int)(undefined8 *)((ulong)puVar9 & 0xfffffffffffffff8)) +
                      0x200U >> 3);
    puVar9 = (undefined8 *)((ulong)puVar9 & 0xfffffffffffffff8);
    for (; uVar11 != 0; uVar11 = uVar11 - 1) {
      *puVar9 = 0;
      puVar9 = puVar9 + (ulong)bVar18 * -2 + 1;
    }
    puVar1[8] = uStack_d0;
    puVar1[9] = local_d8;
    puVar1[6] = uStack_c0;
    puVar1[7] = local_c8;
    puVar1[10] = puStack_f0;
    puVar1[0xb] = local_f8;
    puVar1[0xe] = local_98;
    puVar1[0xf] = uStack_a0;
    puVar1[0x10] = local_e8;
    puVar1[0x11] = uStack_e0;
    puVar1[0xc] = uStack_100;
    puVar1[0xd] = local_108;
    puVar1[0x12] = local_a8;
    puVar1[0x13] = local_b8;
    puVar1[0x14] = uStack_b0;
    puVar1[0x15] = uStack_70;
    puVar1[0x16] = (local_1b8 & 0xffffffff) + local_88;
    puVar1[0x17] = local_78;
    *(short *)(puVar1 + 0x18) = (short)local_80;
    *(short *)((long)puVar1 + 0xc2) = (short)local_38;
    *(short *)((long)puVar1 + 0xc4) = (short)local_40;
    puStack_f0 = puVar1 + 1;
    local_88 = lVar7;
    ptrace(PTRACE_SETREGS,unaff_RBX,0);
    ptrace(PTRACE_CONT);
    puVar9 = (undefined8 *)(puVar13 + 0x28);

Notice the assignment puVar1[0x16] = (local_1b8 & 0xffffffff) + local_88, which through debugging, shows us that we are updating the child’s rip to advance past the SIGBUS instruction by adding the parsed length:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x402a2b                  movups XMMWORD PTR [rdx+0x78], xmm6
     0x402a2f                  shufpd xmm1, xmm1, 0x1
     0x402a34                  movups XMMWORD PTR [rdx+0x58], xmm1
●→   0x402a38                  add    rax, QWORD PTR [rbp-0x80]
     0x402a3c                  movups XMMWORD PTR [rdx+0x88], xmm0
     0x402a43                  movq   xmm0, rax
     0x402a48                  mov    rax, QWORD PTR [rbp-0x78]
     0x402a4c                  movhps xmm0, QWORD PTR [rbp-0x70]
     0x402a50                  movups XMMWORD PTR [rdx+0x98], xmm4
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "chal", stopped 0x402a38 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x402a38 → add rax, QWORD PTR [rbp-0x80]
[#1] 0x83272f → mov edi, eax
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  x/wx $rbp-0x80
0x7fffffffde10:	0x00403718
gef➤  p $rax
$5 = 0x6
gef➤  

I’m really glad that Ghidra’s support for SSE instructions is really good because reading the diassembly for this SUCKS but it’s turned into nice array assignments in the decompilation.

Now, right at the handoff back to the child, we know should expect to see the advanced rip when it restores its state:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x402a85                  mov    QWORD PTR [rbp-0xe8], rdx
     0x402a8c                  mov    r10, QWORD PTR [rbp-0x1a8]
     0x402a93                  mov    rdx, r12
●→   0x402a96                  syscall 
     0x402a98                  xor    r10d, r10d
     0x402a9b                  mov    edi, 0x7
     0x402aa0                  mov    rax, r9
     0x402aa3                  syscall
     0x402aa5                  mov    r8d, 0x3d
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "chal", stopped 0x402a96 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x402a96 → syscall 
[#1] 0x83272f → mov edi, eax
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  x/20gx $r10
0x7fffffffdd90:	0x00007ffff7ff5000	0xffffffffffffffff
0x7fffffffdda0:	0x00007ffff7ff40c0	0x00007ffff7ff6008
0x7fffffffddb0:	0x00007fffffffde90	0x00007ffff7ff40b0
0x7fffffffddc0:	0x00000000000001ec	0x00007ffff7ff440c
0x7fffffffddd0:	0x0000000000000000	0x00007ffff7ff4280
0x7fffffffdde0:	0x0000000000000061	0x0000000000000080
0x7fffffffddf0:	0x0000000000000004	0x0000000000000062
0x7fffffffde00:	0x00007ffff7ff4040	0xffffffffffffffff
0x7fffffffde10:	0x0000000000403770	0x0000000000000033
0x7fffffffde20:	0x0000000000010246	0x00007fffffffdbd8
gef➤  

Wait, what the fuck? Remember that 0x7fffffffde10 holds our rip, so we’re updating the child’s rip to one of those rt_sigreturn functions. Huh?

To Cure a Weakling Child

Okay, things are getting silly. We’ve more or less traced out the parent logic, we’ve got a few things to go back to later… let’s look at the child for a bit.

After the PTRACE_TRACEME call (trace me! trace me!) this is followed up by a bunch of calls to FUN_004036d0

  lVar7 = ptrace(PTRACE_TRACEME,0,0);
  puVar9 = local_2b8 + 2;
  if (lVar7 != 0) goto LAB_00401156;
  _Var2 = getpid();
  kill(_Var2,0x13);
  local_2b8[3] = 0x4011e0;
  local_268 = FUN_004036d0();
  local_2b8[3] = 0x4011ec;
  local_280 = FUN_004036d0();
  local_2b8[3] = 0x4011f8;
  local_1b8 = FUN_004036d0();

Who Needs FLIRT When You Have Your Fucking Eyes

FUN_004036d0 is actually just a handoff call to FUN_00833cd0 with the argument 0x8, and once again we are blasted with a hard to interpret decompilation. This almost feels like something that should be FLIRT’d…

Something to observe though is this funny numeric constant in the function:

...
    uVar2 = uVar2 | uVar2 >> 8;
    iVar9 = (char)(&DAT_00835040)[(~uVar2 & uVar2 + 1) * 0x76be629 >> 0x1b] * 4;
    uVar2 = iVar9 - 4;
    if (*(ushort *)(&DAT_008350a0 + (long)(iVar9 + -3) * 2) < uVar8) {
      uVar2 = iVar9 - 2;
    }
    uVar10 = (ulong)uVar2;
...

Googling 0x76be629 tells us that’s a De Bruijn sequence magic constant?? And there’s some Japanese article talking about it’s use in Microsoft’s mimalloc implementation (do give it a read, this is some cool fucking tech).

Could it be that we’re looking at some sort of malloc function?

We can switch over to debugging the child for a bit and just spoof the ptrace check so we can see how this function plays:

────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  set follow-fork-mode child
gef➤  b *0x4011e0
Breakpoint 7 at 0x4011e0
gef➤  
...
─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x4011d4                  mov    eax, 0x3e
     0x4011d9                  syscall 
●    0x4011db                  call   0x4036d0
●→   0x4011e0                  mov    QWORD PTR [rbp-0x260], rax
     0x4011e7                  call   0x4036d0
     0x4011ec                  mov    QWORD PTR [rbp-0x278], rax
     0x4011f3                  call   0x4036d0
     0x4011f8                  mov    QWORD PTR [rbp-0x1b0], rax
     0x4011ff                  call   0x4036d0
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "chal", stopped 0x4011e0 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x4011e0 → mov QWORD PTR [rbp-0x260], rax
[#1] 0x83272f → mov edi, eax
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  x/20gx $rax
0x7ffff7ff4040:	0xffffffffffffffff	0x0000000000000000
0x7ffff7ff4050:	0x0000000000000000	0x0000000000000000
0x7ffff7ff4060:	0x0000000000000000	0x0000000000000000
0x7ffff7ff4070:	0x0000000000000000	0x0000000000000000
0x7ffff7ff4080:	0x0000000000000000	0x0000000000000000
0x7ffff7ff4090:	0x0000000000000000	0x0000000000000000
0x7ffff7ff40a0:	0x0000000000000000	0x0000000000000000
0x7ffff7ff40b0:	0x0000000000000000	0x0000000000000000
0x7ffff7ff40c0:	0x0000000000000000	0x0000000000000000
0x7ffff7ff40d0:	0x0000000000000000	0x0000000000000000
gef➤  

We can see that after the call, rax points to some 8 byte region of memory that has been set to -1. The most compelling piece of evidence that this is possibly some sort of malloc call is if you sift through some of the nested function calls, landing you on FUN_00834190, where there’s a clear mmap call being made. This isn’t super rigorous, but we can hold on to this assumption for the timebeing. (Note: this is probably musl’s mallocng implementation. I know this because I have brainworms.)

So our child starts with a bunch of malloc calls, supposedly. This is followed up by a bunch of calls to FUN_004036f0 with those mallocd addresses and the characters of our input as the arguments:

...
  local_2b8[5] = FUN_004036d0();
  local_2b8[3] = 0x40137c;
  uVar6 = FUN_004036d0();
  local_2b8[3] = 0x401390;
  FUN_004036f0(local_268,*puVar10);
  local_2b8[3] = 0x4013a2;
  FUN_004036f0(local_280,puVar10[1]);
  local_2b8[3] = 0x4013b4;
  FUN_004036f0(local_1b8,puVar10[2]);
  local_2b8[3] = 0x4013c6;
  FUN_004036f0(local_1c0,puVar10[3]);
...

This is the function where we encountered our SIGBUS from earlier!

I Don’t Have a Good Bus Joke

Looking at it’s disassembly, we see something interesting:

    undefined FUN_004036f0()
    FUN_004036f0

    ENDBR64
    LEA        EAX,[RSI + -0x1]
    TEST       SIL,SIL
    JZ         LAB_0040372b
    TEST       SIL,0x1
    JZ         LAB_00403718
    MOVZX      R12W,byte ptr [R15 + 0x36]
    LEA        EAX,[RSI + -0x2]
    CMP        SIL,0x1
    JZ         LAB_0040372c
    NOP        dword ptr [RAX]
                
    LAB_00403718

    MOVZX      R12W,byte ptr [R15 + 0x36]
    MOVZX      R12W,byte ptr [R15 + 0x36]
    SUB        EAX,0x2
    CMP        AL,0xff
    JNZ        LAB_00403718

    LAB_0040372b

    RET

    LAB_0040372c

    RET

Knowing that rsi holds our input character, we see that it essentially calls movzx r12w, byte ptr [r15 + 0x36] the same number of times as the value of the current character, which means we trigger the SIGBUS that same number of times.

This also makes clear why debugging the parent alone will quickly become untenable: for a 30 character input, we will be triggering SIGBUS hundreds of times just in these function calls alone!

Actually, why does this instruction even trigger a SIGBUS?

Since we’re debugging the child, we don’t let the parent trace it and we’ll just hit the SIGBUS ourselves and don’t get to see the after effects of the PTRACE_SETREGS bullshit from earlier.

...
$r12   : 0x00007fffffffe20b  →  "bctf{abcdefghijklmnopqrstuvwxyzABCD}"
$r13   : 0x00007ffff7ff40c0  →  0xffffffffffffffff
$r14   : 0xffffffffffffffff
[=] Cannot access memory at address 0x7ffff7ff5000
$r15   : 0x00007ffff7ff5000  →  0x00007ffff7ff5000
$eflags: [ZERO carry PARITY adjust sign trap INTERRUPT direction overflow RESUME virtualx86 identification]
$cs: 0x33 $ss: 0x2b $ds: 0x00 $es: 0x00 $fs: 0x00 $gs: 0x00
...
─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
     0x40370b                  cmp    sil, 0x1
     0x40370f                  je     0x40372c
     0x403711                  nop    DWORD PTR [rax+0x0]
 →   0x403718                  movzx  r12w, BYTE PTR [r15+0x36]
     0x40371e                  movzx  r12w, BYTE PTR [r15+0x36]
     0x403724                  sub    eax, 0x2
     0x403727                  cmp    al, 0xff
     0x403729                  jne    0x403718
     0x40372b                  ret
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "chal", stopped 0x403718 in ?? (), reason: SIGBUS
───────────────────────────────────────────────────────────────────────────────────────── trace ────
[#0] 0x403718 → movzx r12w, BYTE PTR [r15+0x36]
[#1] 0x401390 → movzx esi, BYTE PTR [r12+0x1]
[#2] 0x83272f → mov edi, eax
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  vmmap
[ Legend:  Code | Stack | Heap ]
Start              End                Offset             Perm Path
0x0000000000400000 0x0000000000401000 0x0000000000000000 r-- /home/jun/shared/Projects/ctfs/boiler/2026/rev/piano/chal
0x0000000000401000 0x0000000000835000 0x0000000000001000 r-x /home/jun/shared/Projects/ctfs/boiler/2026/rev/piano/chal
0x0000000000835000 0x0000000000836000 0x0000000000435000 r-- /home/jun/shared/Projects/ctfs/boiler/2026/rev/piano/chal
0x0000000000836000 0x0000000000838000 0x0000000000435000 rw- /home/jun/shared/Projects/ctfs/boiler/2026/rev/piano/chal
0x0000000000838000 0x0000000000839000 0x0000000000000000 --- [heap]
0x0000000000839000 0x000000000083a000 0x0000000000000000 rw- [heap]
0x00007ffff7ff4000 0x00007ffff7ff5000 0x0000000000000000 rw-
0x00007ffff7ff5000 0x00007ffff7ff6000 0x0000000000000000 rw- /memfd: (deleted)
0x00007ffff7ff6000 0x00007ffff7ff7000 0x0000000000000000 rw- /dev/zero (deleted)
0x00007ffff7ff7000 0x00007ffff7ffb000 0x0000000000000000 r-- [vvar]
0x00007ffff7ffb000 0x00007ffff7ffd000 0x0000000000000000 r-- [vvar_vclock]
0x00007ffff7ffd000 0x00007ffff7fff000 0x0000000000000000 r-x [vdso]
0x00007ffffffde000 0x00007ffffffff000 0x0000000000000000 rw- [stack]
0xffffffffff600000 0xffffffffff601000 0x0000000000000000 --x [vsyscall]

gef➤  

Notice that r15 contains the address 0x7ffff7ff5000 and even gdb complains that it cannot access the memory there. A look at vmmap shows us that it’s… a deleted memfd region? Woagh! When did that happen?

All the way back in the entry function, we sort of glossed over it a little bit:

void processEntry entry(void)

{
  FUN_00402b10();
  FUN_00832740(FUN_004010b0,*(undefined4 *)register0x00000020,
               (undefined4 *)((long)register0x00000020 + 8),FUN_00401000,FUN_0083484b,0);
  return;
}

FUN_00832740 just looks like a stripped __libc_start_main call, and this whole time we’ve been analyzing FUN_004010b0, so it seemed to check out (Actually, this is the musl implementation of libc_start_main_stage2!). Ghidra is being a little silly with the decompilation, but what’s important is this function call at the end of it:

void FUN_00832740(code *param_1,undefined8 *param_2,undefined8 *param_3)
{
  undefined8 uVar1;
  undefined8 uVar2;
  undefined8 uVar3;
  uint uVar4;
  undefined8 *extraout_RDX;
  undefined8 *puVar5;
  long unaff_RBX;
...
    FUN_00401010();
    param_3 = extraout_RDX;
    register0x00000020 = (BADSPACEBASE *)((long)register0x00000020 + -0x28);
    unaff_RBP = puVar5;
    unaff_R12 = pcVar7;
  } while( true );
}

That call, FUN_00401010, is where our memfd nonsense occurs:

void FUN_00401010(undefined4 param_1)
{
  int __fd;
  
  FUN_008327f0();
  FUN_00832800();
  FUN_008327f0();
  FUN_008347e0(param_1);
  DAT_00837248 = mmap((void *)0x0,0x1000,3,0x21,-1,0);
  __fd = memfd_create("",0);
  mmap((void *)0x0,0x1000,3,2,__fd,0);
  close(__fd);
  return;
}

This is really cool. Our SIGBUS memory region triggers a SIGBUS rather than a SIGSEGV because after creating the memfd region, we do mmap to it but we don’t make a ftruncate call or anything to actually give the memfd region any physical memory. This means there’s no backing memory to the mmap virtual mapping!

The most important aspect of this is that the memfd address stored in r15 does not get touched, i.e. r15 retains it and (it seems like) every instruction in the child process that tries to R/W to the address stored in r15 will trigger a SIGBUS. This is huge.

Sifting

Actually, so far we’ve established that we’ve got the whole SIGBUS handoff and the weird rt_sigreturn business that the parent appears to establish. We still haven’t really encountered the flagchecking logic…? We continue to stall and waste time anyways by giving the other functions the child calls a quick glance before we setup some Proper Debugging that will let us meaningfully continue with our analysis.

There’s two more functions we see the child call which are repeated, those being FUN_00403730 and FUN_00403750. Both of these functions are almost identical, with the difference lying in the SIGBUS triggering instruction:

    undefined FUN_00403730()
    FUN_00403730          

    ENDBR64
    MOV        RAX,qword ptr [RSI]
    CMP        RAX,R14
    JZ         LAB_0040374e
    MOV        RDX,R14
    NOP

    LAB_00403740                                    

    MOVZX      R8D,word ptr [R15 + 0x25]
    MOV        RAX,qword ptr [RAX]
    CMP        RAX,RDX
    JNZ        LAB_00403740

    LAB_0040374e                                    

    RET

The above SIGBUS trigger has instruction bytes 45 0f b7 44 67 25. Recalling our parent handler logic from earlier in FUN_00402c00, this will load 0x403770 into rax and subsequently this process’ rip.

    undefined FUN_00403750()
    FUN_00403750                   

    ENDBR64
    MOV        RAX,qword ptr [RSI]
    CMP        RAX,R14
    JZ         LAB_0040376d
    MOV        RDX,R14
    NOP

    LAB_00403760

    XOR        BP,word ptr [R15 + 0x27]
    MOV        RAX,qword ptr [RAX]
    CMP        RAX,RDX
    JNZ        LAB_00403760

    LAB_0040376d

    RET

The above SIGBUS trigger has instruction bytes 66 43 33 6f 27. Once again, tracing the parent logic, this will drop us into the other short function, 0x4037a0, when we return execution.

At this point, we’re really starved for some debugging. It’d be nice to see what exactly happens to our child after this SIGBUS back and forth. So how can we overcome the restrictions that nanomites has placed on us?

Quick Emulator Ain’t Not Quick!

Okay so like instead of debugging the process what if we just debugged an entire system. This may seem like a sledgehammer approach to our problem but actually… yeah, no. This is a sledgehammer.

To continue with our debugging adventures, we grab ourselves an Arch Linux ISO, duh.

➜ wget https://geo.mirror.pkgbuild.com/iso/latest/archlinux-x86_64.iso
➜ qemu-system-x86_64 \
    -m 2048 \
    -cdrom archlinux-x86_64.iso \
    -drive file=fat:rw:./share,format=raw \
    -boot d \
    -s -S

Running QEMU with the -s flag will make it listen for a gdb connection on port 1234 and -S will stall QEMU until we’ve attached and tell gdb to continue execution.

Fig: I use Arch btw

Once Arch boots, we copy over the challenge files, and now we’re free to set breakpoints in gdb that will… just work!

Fig: Copying over the challenge

Back in gdb we set a breakpoint for the first 0x4036f0 call in the child process, and we can see that we can successfully step over it without a hitch!

───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0xffffffff92df958f → ret 
[#1] 0xffffffff92dfb689 → nop 
[#2] 0xffffffff92dfb918 → call 0xffffffff92dfa820
[#3] 0xffffffff91dd4818 → jmp 0xffffffff91dd47c2
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  b *0x040138b
Breakpoint 1 at 0x40138b
(remote) gef➤  
...
─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x401381                  mov    rdi, QWORD PTR [rbp-0x260]
     0x401388                  mov    r8, rax
●    0x40138b                  call   0x4036f0
●→   0x401390                  movzx  esi, BYTE PTR [r12+0x1]
     0x401396                  mov    rdi, QWORD PTR [rbp-0x278]
     0x40139d                  call   0x4036f0
     0x4013a2                  movzx  esi, BYTE PTR [r12+0x2]
     0x4013a8                  mov    rdi, QWORD PTR [rbp-0x1b0]
     0x4013af                  call   0x4036f0
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x401390 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x401390 → movzx esi, BYTE PTR [r12+0x1]
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  

That’s wonderful. We can now test our assumptions concerning the funny rt_sigreturn business finally by placing a breakpoint on FUN_00403770 and seeing that indeed, FUN_00833cd0 is a malloc call, and what this function is doing is… creating a linked list?

This part of the decompilation elucidates that:

  puVar2 = (undefined8 *)thunk_FUN_00833cd0(8);
  *puVar2 = *param_1;
  *param_1 = puVar2;

Knowing now that puVar2 is the result of malloc(8), we can see that what it does is store the address of the previous malloc into the new malloc’d memory, then update *param_1 to point to this new malloc’d memory.

Also, the rt_sigreturn call does actually behave as expected, returning execution to the calculated incremented rip from the parent’s side

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x40378e                  mov    QWORD PTR [rbx], rax
     0x403791                  mov    rsp, r13
     0x403794                  mov    eax, 0xf
 →   0x403799                  syscall 
     0x40379b                  ud2
     0x40379d                  nop    DWORD PTR [rax]
     0x4037a0                  endbr64
     0x4037a4                  mov    rax, QWORD PTR [rdi]
     0x4037a7                  mov    rcx, QWORD PTR [rax]
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x403799 in ?? (), reason: SINGLE STEP
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x403799 → syscall 
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  x/30gx $rsp
0x7ffff7ff6008:	0x0000000000000000	0x0000000000000000
0x7ffff7ff6018:	0x0000000000000000	0x0000000000000000
0x7ffff7ff6028:	0x0000000000000000	0x00007ffff7ff4280
0x7ffff7ff6038:	0x0000000000000000	0x00007ffff7ff440c
0x7ffff7ff6048:	0x00000000000001ec	0x00007fffffffe58e
0x7ffff7ff6058:	0x00007ffff7ff40c0	0xffffffffffffffff
0x7ffff7ff6068:	0x00007ffff7ff5000	0x00007ffff7ff4040
0x7ffff7ff6078:	0x0000000000000062	0x00007fffffffe280
0x7ffff7ff6088:	0x00007ffff7ff40b0	0x0000000000000004
0x7ffff7ff6098:	0x0000000000000061	0x0000000000000080
0x7ffff7ff60a8:	0x00007fffffffdfc8	0x0000000000403724
0x7ffff7ff60b8:	0x0000000000010246	0x0000000000000033
0x7ffff7ff60c8:	0x0000000000000000	0x0000000000000000
0x7ffff7ff60d8:	0x0000000000000000	0x0000000000000000
0x7ffff7ff60e8:	0x0000000000000000	0x0000000000000000
(remote) gef➤  

In this example, our SIGBUS was triggered on the instruction 0x403718, and now we are restoring rip to be 0x403724, an increment of 0x6. So actually, this whole time what we thought was a register struct construction on DAT_00837248 for PTRACE_SETREGS on the parent’s side was actually an SROP frame!

Now that we know triggering FUN_00403770 on our SIGBUS will push a linked list node, it seems like FUN_004036f0 will create a linked list of length c for each character c of the input. Writing a simple gdb function to help us calculate the linked list length confirms this:

(remote) gef➤  define get_len
Type commands for definition of "get_len".
End with a line saying just "end".
>set $curr = (unsigned long long)$arg0
>set $count = 0
>while $curr != 0xffffffffffffffff
 >set $count = $count + 1
 >set $curr = *(unsigned long long *)$curr
 >end
>set $count = $count - 1
>printf "Linked list length: %d\n", $count
>end
(remote) gef➤  
...
─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x40137c                  movzx  esi, BYTE PTR [r12]
     0x401381                  mov    rdi, QWORD PTR [rbp-0x260]
     0x401388                  mov    r8, rax
●→   0x40138b                  call   0x4036f0
   ↳    0x4036f0                  endbr64
        0x4036f4                  lea    eax, [rsi-0x1]
        0x4036f7                  test   sil, sil
        0x4036fa                  je     0x40372b
        0x4036fc                  test   sil, 0x1
        0x403700                  je     0x403718
─────────────────────────────────────────────────────────────────────────── arguments (guessed)
 ────
0x4036f0 (
   $rdi = 0x00007ffff7ff4040 → 0xffffffffffffffff,
   $rsi = 0x0000000000000062,
   $rdx = 0x0000000000000004,
   $rcx = 0x0000000000000080,
   $r8 = 0x00007ffff7ff4280 → 0xffffffffffffffff
)
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x40138b in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x40138b → call 0x4036f0
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  

Here we break on the first call to 0x4036f0 in the child process. Take note of rdi, which is the malloc’d region from earlier, and rsi, which is the character that we’re loading (rsi = 0x62 = 'b').

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x401381                  mov    rdi, QWORD PTR [rbp-0x260]
     0x401388                  mov    r8, rax
●    0x40138b                  call   0x4036f0
●→   0x401390                  movzx  esi, BYTE PTR [r12+0x1]
     0x401396                  mov    rdi, QWORD PTR [rbp-0x278]
     0x40139d                  call   0x4036f0
     0x4013a2                  movzx  esi, BYTE PTR [r12+0x2]
     0x4013a8                  mov    rdi, QWORD PTR [rbp-0x1b0]
     0x4013af                  call   0x4036f0
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x401390 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x401390 → movzx esi, BYTE PTR [r12+0x1]
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 98
(remote) gef➤  

Stepping over the call, we now see that rdi has a linked list of length 98 = 0x62!

Fill in the Blank

Now that we have better debugging in place and some functions elucidated, we can summarise some of the features so far and start filling in some blanks:

We should probably look into FUN_004037a0 real quick.

Free Bird

We know that in the child, we have those repeated calls to FUN_00403730 or FUN_00403750, with the latter being known to trigger FUN_004037a0 based on the SIGBUS trigger instruction bytes. So, let’s just set a breakpoint on FUN_004037a0 and observe what it does:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x403799                  syscall 
     0x40379b                  ud2    
     0x40379d                  nop    DWORD PTR [rax]
●→   0x4037a0                  endbr64 
     0x4037a4                  mov    rax, QWORD PTR [rdi]
     0x4037a7                  mov    rcx, QWORD PTR [rax]
     0x4037aa                  mov    QWORD PTR [rdi], rcx
     0x4037ad                  mov    r13, r12
     0x4037b0                  mov    rdi, rax
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x4037a0 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x4037a0 → endbr64 
[#1] 0x401623 → mov rdi, r11
[#2] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 118
(remote) gef➤  

Even looking at the function statically, we can make an educated guess as to what it does. We know that FUN_00403770 is a linked list push operation done with a malloc, and the pseudocode for FUN_004037a0 looks remarkably like a pop operation:

  puVar1 = (undefined8 *)*param_1;
  *param_1 = *puVar1;
  thunk_FUN_00832d30(puVar1);

We can see a dereference to go one node down the linked list, then an update to the pointer. In fact, skimming through FUN_00832d30, we do find a munmap call in FUN_008342c0:

void FUN_008342c0(void *param_1,size_t param_2)
{
  int iVar1;
  
  FUN_00834180();
  iVar1 = munmap(param_1,param_2);
  FUN_00834800(iVar1);
  return;
}

So FUN_00832d30 is probably the free to our FUN_00833cd0’s malloc! Allowing ourselves to step through the function, we do see that our linked list does get shortened by one node:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x4037a4                  mov    rax, QWORD PTR [rdi]
     0x4037a7                  mov    rcx, QWORD PTR [rax]
     0x4037aa                  mov    QWORD PTR [rdi], rcx
 →   0x4037ad                  mov    r13, r12
     0x4037b0                  mov    rdi, rax
     0x4037b3                  and    rsp, 0xfffffffffffffff0
     0x4037b7                  call   0x832840
     0x4037bc                  mov    rsp, r13
     0x4037bf                  mov    eax, 0xf
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x4037ad in ?? (), reason: SINGLE STEP
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x4037ad → mov r13, r12
[#1] 0x401623 → mov rdi, r11
[#2] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 117
(remote) gef➤  

Nice. Now that we know that FUN_004037a0 is a linked list pop operation and FUN_00403770 is a linked list push operation, we can fill out what FUN_00403730 and FUN_00403750 do.

Guess We’re Doing O(N) Arithmetic

Fig: picrel

Both of these functions take in two arguments, with them both being some linked list, i.e. some character values from the flag. Here we’ve broken at the first FUN_00403750 call to take a looky:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x40160d                  mov    rsi, rdx
     0x401610                  mov    QWORD PTR [rbp-0x220], r9
     0x401617                  mov    QWORD PTR [rbp-0x1e8], rdx
●→   0x40161e                  call   0x403750
   ↳    0x403750                  endbr64
        0x403754                  mov    rax, QWORD PTR [rsi]
        0x403757                  cmp    rax, r14
        0x40375a                  je     0x40376d
        0x40375c                  mov    rdx, r14
        0x40375f                  nop
─────────────────────────────────────────────────────────────────────────── arguments (guessed)
 ────
0x403750 (
   $rdi = 0x00007ffff7ff41e0 → 0x00007ffff7fe73d0 → 0x00007ffff7fe73c0 → 0x00007ffff7fe73b0 → 0
x00007ffff7fe73a0 → 0x00007ffff7fe7390 → 0x00007ffff7fe7380 → 0x00007ffff7fe7370,
   $rsi = 0x00007ffff7ff4110 → 0x00007ffff7fee2d0 → 0x00007ffff7fee2c0 → 0x00007ffff7fee2b0 → 0
x00007ffff7fee2a0 → 0x00007ffff7fee290 → 0x00007ffff7fee280 → 0x00007ffff7fee270
)
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x40161e in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x40161e → call 0x403750
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 118
(remote) gef➤  get_len $rsi
Linked list length: 105
(remote) gef➤  p $r14
$2 = 0xffffffffffffffff
(remote) gef➤  

We can see here that with our dummy input bctf{abcdefghijklmnopqrstuvwxyzABCD}, these arguments are flag[26] and flag[13]. Stepping through the assembly, we continually dereference a copy of rsi stored in rax until we hit cmp rax, r14, which is just a check against 0xffffffffffffffff (which is stored in r14):

    undefined FUN_00403750()
    FUN_00403750                   

    ENDBR64
    MOV        RAX,qword ptr [RSI] ; Load rsi into rax
    CMP        RAX,R14 ; Check if we have hit 0xffffffffffffffff
    JZ         LAB_0040376d ; Terminate if so
    MOV        RDX,R14 ; Load r14 into rdx
    NOP

    LAB_00403760

    XOR        BP,word ptr [R15 + 0x27] ; Trigger SIGBUS, call linked list pop function FUN_004037a0
    MOV        RAX,qword ptr [RAX] ; Dereference rax by one, traversing down the linked list
    CMP        RAX,RDX ; Check if we have hit 0xffffffffffffffff
    JNZ        LAB_00403760 ; Loop otherwise

    LAB_0040376d

    RET

From our understanding of the structs, this means we’ve essentially entered a for loop of the length of rsi, and in this for loop, we are calling FUN_004037a0 on rdi that number of times. In other words… this is an $O(N)$ subtraction function, performing rdi -= rsi. We can confirm this by inspecting the values of rdi and rsi right after the function call:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x401610                  mov    QWORD PTR [rbp-0x220], r9
     0x401617                  mov    QWORD PTR [rbp-0x1e8], rdx
●    0x40161e                  call   0x403750
●→   0x401623                  mov    rdi, r11
     0x401626                  mov    QWORD PTR [rbp-0x1f8], r11
     0x40162d                  call   0x403810
     0x401632                  mov    r12, QWORD PTR [rbp-0x2b0]
     0x401639                  mov    rcx, QWORD PTR [rbp-0x1a8]
     0x401640                  mov    rsi, r12
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x401623 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x401623 → mov rdi, r11
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 13
(remote) gef➤  get_len $rsi
Linked list length: 105
(remote) gef➤  

Yeah, last I checked 118 - 105 = 13, so this looks correct.

Similarly, the sister function FUN_00403730 triggers push operations, which means it implements $O(N)$ addition, serving as a rdi += rsi function. Verification of this has been left as an exercise to the reader.

Clarity

As it stands, we can also start to think about the flagchecker logic a bit. We now know that this program is just incrementing and decrementing the values of our flag characters. What causes it to fail?

Recall that in the parent’s logic, it will fall to the Wrong block if we hit a SIGSEGV. And what would trigger a SIGSEGV? If we try to call free on 0xffffffffffffffff, or in other words, if we try to decrement a character into a negative number! So it seems like for us to win, we need to ensure that throughout all these additions and subtractions and increments and decrements that we never go negative.

We can quickly simulate this by passing in a dummy flag like bctf{zzzzzzzzzzzzzzzzzzzzzAzzzzzzzz}, which should cause the subtraction we looked at earlier to go negative. We place a breakpoint on 0x4028f7, which is the signal check against SIGSEGV in the parent, and a breakpoint for right after the first subtraction, at 0x401623:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x4028ef                  movzx  edx, dh
     0x4028f2                  cmp    edx, 0x7
     0x4028f5                  je     0x402915
●→   0x4028f7                  cmp    edx, 0xb
     0x4028fa                  je     0x402abd
     0x402900                  movsxd r10, edx
     0x402903                  mov    edi, 0x7
     0x402908                  mov    rax, r9
     0x40290b                  mov    rsi, rbx
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x4028f7 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x4028f7 → cmp edx, 0xb
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  p $edx
$3 = 0xb
(remote) gef➤  

We can see the parent breakpoint hits with a SIGSEGV, and our child’s breakpoint never hit, implying our crash occured within the subtraction function. This may seem non-rigorous, but it’s enough information for us to carry on.

Actually Solving the Challenge

Okay, we now have all the puzzle pieces. We know we’ve got linked list arithmetic, we know our fail condition, all that’s left are the really big functions in the child which we know are also triggering SIGBUS left and right. If we can quickly, grossly generalise the behaviour of these functions, we can then perhaps write a parser to determine what each function is doing, get our constraints and win.

I mean, we only have addition and subtraction to work with, so the functions can’t be that complicated, right?

Actually, I am right.

Generalising the Remaining Child Functions

Let’s do a little eyeballing.

One of the big functions we haven’t analysed yet is FUN_00403810, and the disassembly looks a little like this:

    undefined FUN_00403810()
    FUN_00403810                                    
    MOV        word ptr [R15 + 0x36],CX
    BSF        R10D,dword ptr [R15 + 0x1d]
    SUB        R14,qword ptr [R15 + 0x20]
    MOVZX      RBP,word ptr [R15 + 0x8]
    MOVSX      EBP,word ptr [R15 + 0x42]
    MOVSX      BP,byte ptr [R15 + 0x63]
    MOVZX      RCX,word ptr [R15 + 0x48]
    BSR        R12D,dword ptr [R15 + 0xd]
    MOVZX      R9,byte ptr [R15 + 0x13]
    OR         R8,qword ptr [R15 + 0x3b]
    AND        R10B,byte ptr [R15 + 0x3b]
    MOVSX      EAX,word ptr [R15 + 0x2d]
    ...

Wow… SIGBUS galore… scrolling through the disassembly, every function seems to use r15 as some address, so I’m just going to Assume and Generalise that every single instruction triggers a SIGBUS. And that’s called rigor.

On our end, when we are constructing a parser, we just need to copy the logic from the parent to switch between addition and subtraction based on the instruction bytes. For a function like this, once done it is equivalent to rdi += c for some integer constant c, and we can compute that c easily on our end.

Now it’d be great if all the other functions looked like this, but unfortunately there’s another class of functions to generalise.

Take FUN_00734180 for example:

undefined FUN_00734180()
FUN_00734180

PUSH       RBX
PUSH       R12
PUSH       RBP
MOV        RBX,RDI
PUSH       -0x1
MOV        R12,RSP
MOV        RDI,R12
MOV        RBP,RBX

LAB_00734192                                    

MOV        RBP,qword ptr [RBP]
CMP        RBP,R14
JZ         LAB_007341a3
MOVSX      RSP,byte ptr [R15 + 0x77]
JMP        LAB_00734192

LAB_007341a3                                    

MOV        RBP,RBX

LAB_007341a6                                    

MOV        RBP,qword ptr [RBP]
CMP        RBP,R14
JZ         LAB_007341b7
MOV        word ptr [R15 + 0x41],BP
JMP        LAB_007341a6

...

Okay, I know loops are a bit hard to see with flat disassembly output, but I’m not pulling out radare2 to pretty print this. Just by using our eyes, we can trace rdi going into rbx, then rbx getting loaded into rbp before these loops. We can make an educated guess that we’re looking at some sort of multiplication function, where the number of SIGBUS loop blocks are our multiplier.

How do we differentiate between the two types of functions? Well, it seems like all the multiplication blocks start with push rbx, so we just scan the first byte of the function for 0x53 and call it a day, duh.

Parsing Function Arguments

Now that we have an idea of how we want to stereotype these functions, we can actually get down to parsing them, then eyeballing whether our parser produces the correct results.

We don’t have to write a good parser, we just need to write one that works.

The first course of action is to map our stack variables to the input flag values. Because the author is Evil (jk, love you) the stack variables are not consecutive and we need to keep track of the various displacements the flag characters are being assigned to.

We start writing our capstone script to parse bytes after 0x40137c to skip the malloc calls. Thankfully, for this part of the assembly, even if there is an indirect assignment (e.g. mov rdx, qword ptr [rbp-0x1f0] followed by mov rdi, rdx later), there should only be one rbp displacement mov before each call to FUN_004036f0, so we just keep track of that:

from capstone import *
from capstone.x86 import *

f = open("chal", "rb")
f.seek(0x137c)

md = Cs(CS_ARCH_X86, CS_MODE_64)
md.detail = True
b = f.read(5374)

inp_mapping = {}
inp_mapping[7] = 7 # rbx
inp_mapping[8] = 8 # r13

for ins in md.disasm(b, 0x40137c):
    i = ins.operands
    if ins.address < 0x40160a:
        if ins.mnemonic == 'mov':
            if i[1].type == X86_OP_MEM:
                if ins.reg_name(i[1].mem.base) == 'rbp':
                    inp_mapping[i[1].value.mem.disp-8] = n
                    n += 1
                    if n == 7: n = 9

Here we map the rbp displacement value to the corresponding flag input index. Just to spite me specifically, there is no rbp displacement assigned for flag characters 0x7 and 0x8 and instead the values from rbx and r13 are used, so we just put some dummy values in their place and skip past them in our parser. We parse up to 0x40160a, since we know that is when the calls end.

The next part is keeping track of the arguments going into the various SIGBUS triggering functions. We just need to keep track of the rdi and rsi values, mapping them to the rbp displacements that we have accumulated.

There’s something weird going on with some of these assignments:

...
    MOV        RSI,RCX
    MOV        RDI,R8
    MOV        qword ptr [RBP + local_218],RCX
    MOV        qword ptr [RBP + local_220],R8
    CALL       FUN_00403750
...

Notice how rcx and r8 are loaded into rsi and rdi first, then… loaded into an rbp displacement? During debugging, those displacements are actually the same values that rcx and r8 already possess, and if we back up on the assembly a bit, we can see that those are the prior assignments:

...
    MOV        RCX,qword ptr [RBP + local_218] ; note the assignment of [rbp-0x218] to rcx here
    MOV        RDI,qword ptr [RBP + local_1c0]
    MOV        RSI,RCX
    CALL       FUN_00403730
    MOV        R8,qword ptr [RBP + local_220] ; note the assignment of [rbp-0x220] to r8 here
    MOV        RSI,qword ptr [RBP + local_210]
    MOV        RDI,R8
    CALL       FUN_00403730
    MOV        RSI,qword ptr [RBP + local_2b0]
    MOV        RDI,RBX
    CALL       FUN_00403730
    MOV        RSI,RCX ; rsi = rcx = [rbp-0x218]
    MOV        RDI,R8  ; rdi = r8 = [rbp-0x220]
    MOV        qword ptr [RBP + local_218],RCX ; what the fuck is with this assignment
    MOV        qword ptr [RBP + local_220],R8  ; like, why???
    CALL       FUN_00403750
...

This happens on basically all the invariant register moves. I, uh, have no good theories on why this is done. When parsing, as long as we maintain our register state diligently, this is a non-issue as the displacements are tracked properly. I’m not sure what this would break, it just made me really confused while solving.

Either way, as stated, we can just track our register states and go along our merry way:

...

regs = {}
regs['rbx'] = 7
regs['r13'] = 8

for ins in md.disasm(b, 0x40137c):
    if ins.mnemonic == 'mov':
        i = ins.operands
        if len(i) == 2:
            if i[0].type == X86_OP_REG and i[1].type == X86_OP_MEM:
                if ins.reg_name(i[1].mem.base) == 'rbp':
                    disp = i[1].value.mem.disp - 8
                    # we put our initial mapping stuff here
                    if ins.address < 0x40160a:
                        if disp not in inp_mapping:
                            inp_mapping[disp] = n
                            n += 1
                            if n == 7: n = 9
                    # track displacements stored into registers
                    regs[ins.reg_name(i[0].value.reg)] = disp
            
            # r to r movement
            elif i[0].type == X86_OP_REG and i[1].type == X86_OP_REG:
                dst = ins.reg_name(i[0].value.reg)
                src = ins.reg_name(i[1].value.reg)
                if src in regs:
                    regs[dst] = regs[src]
                    
    elif ins.mnemonic == 'call' and ins.address >= 0x40160a:
        if len(ins.operands) > 0 and ins.operands[0].type == X86_OP_IMM:
            print(f"# {hex(ins.address)}")
            imm = ins.operands[0].value.imm
            # imm has the function that we're calling

Okay nice, now we just need to deal with function emulation.

Emulating Functions

We’ve already determined FUN_00403730 to be an addition function and FUN_004037a0 to be a subtraction function, so we can fill that in quickly:

...
    if imm == 0x403750: # subtraction
        a = regs.get('rdi')
        b = regs.get('rsi')
        i0 = inp_mapping.get(a)
        i1 = inp_mapping.get(b)
        print(f"flag[{i0}] -= flag[{i1}]")
                
    elif imm == 0x403730: # addition
        a = regs.get('rdi')
        b = regs.get('rsi')
        i0 = inp_mapping.get(a)
        i1 = inp_mapping.get(b)
        print(f"flag[{i0}] += flag[{i1}]")
...

Then for the constant addition or constant multiplication functions, we do up a quick parser for them:

...

def big_func(addr, idx):
    f.seek(addr-0x400000)
    b_func = f.read(0x40000)
    mul = False
    n_cnt = 0
    if b_func[0] == 0x53: mul = True
    for ins in md.disasm(b_func, addr):
        if mul:
            if "r15" in ins.op_str: n_cnt += 1 # by observation :)
        else:
            if "r15" in ins.op_str: # replicate the parent logic block
                eax = ins.bytes[0]
                esi = len(ins.bytes)
                rdx = eax % esi
                eax = ins.bytes[rdx]
                if eax & 4 == 0:
                    n_cnt -= 1
                else:
                    n_cnt += 1
        if ins.mnemonic == 'ret':
            if mul: 
                print(f"flag[{idx}] *= {n_cnt-1}")
            else:
                print(f"flag[{idx}] += {n_cnt-1}")
            return

...

As mentioned earlier, we just use the first byte of the function as our indicator whether we’re looking at constant addition or multiplication. Additionally, we just be lazy with it and just spot for any instructions with r15 in it and call it a day for calculating the multiplier. We can see that it does check out against the debugger!

For example, we calculate FUN_00734180 to be a multiplier of 231 based on the parser, and we can see that it checks out in debugging:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x401699                  mov    rdi, r13
     0x40169c                  call   0x403750
     0x4016a1                  mov    rdi, QWORD PTR [rbp-0x238]
●→   0x4016a8                  call   0x734180
[!] Cannot disassemble from $PC
─────────────────────────────────────────────────────────────────────────── arguments (guessed)
 ────
0x734180 (
   $rdi = 0x00007ffff7ff41a0 → 0x00007ffff7fed430 → 0x00007ffff7fed420 → 0x00007ffff7fed410 → 0
x00007ffff7fed400 → 0x00007ffff7fed3e0 → 0x00007ffff7fed3d0 → 0x00007ffff7fed3c0,
   $rsi = 0x00007ffff7ff4180 → 0x00007ffff7feeb40 → 0x00007ffff7feeb30 → 0x00007ffff7feeb20 → 0
x00007ffff7feeb10 → 0x00007ffff7feeb00 → 0x00007ffff7feeaf0 → 0x00007ffff7feeae0,
   $rdx = 0xffffffffffffffff,
   $rcx = 0x00007ffff7ff4200 → 0xffffffffffffffff,
   $r8 = 0x00007ffff7ff4280 → 0x00007ffff7fea020 → 0x00007ffff7fea010 → 0x00007ffff7fea000 → 0x
00007ffff7fe9ff0 → 0x00007ffff7fe9fe0 → 0x00007ffff7fe9fd0 → 0x00007ffff7fe9fc0
)
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x4016a8 in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x4016a8 → call 0x734180
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 65
(remote) gef➤  

Here we break right before the call, and right after:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64
 ────
     0x40169c                  call   0x403750
     0x4016a1                  mov    rdi, QWORD PTR [rbp-0x238]
●    0x4016a8                  call   0x734180
●→   0x4016ad                  mov    rdi, QWORD PTR [rbp-0x1b8]
     0x4016b4                  call   0x73dd80
     0x4016b9                  mov    rsi, QWORD PTR [rbp-0x1a8]
     0x4016c0                  mov    rdi, QWORD PTR [rbp-0x280]
     0x4016c7                  call   0x403730
     0x4016cc                  mov    rdi, QWORD PTR [rbp-0x1b0]
─────────────────────────────────────────────────────────────────────────────────────── threads
 ────
[#0] Id 1, stopped 0x4016ad in ?? (), reason: BREAKPOINT
───────────────────────────────────────────────────────────────────────────────────────── trace
 ────
[#0] 0x4016ad → mov rdi, QWORD PTR [rbp-0x1b8]
[#1] 0x83272f → mov edi, eax
───────────────────────────────────────────────────────────────────────────────────────────────
─────
(remote) gef➤  get_len $rdi
Linked list length: 15015
(remote) gef➤  

Yup, 65 * 231 = 15015 as expected. Why is there an off-by-one error in my code? Probably because my code is garbage.

Anyways, with this we can dump all the constraints of the challenge, or rather, the equations along the way:

    # 0x40161e
    flag[26] -= flag[13]
    # 0x40162d
    flag[30] += 218
    # 0x401646
    flag[28] -= flag[20]
    # 0x40165c
    flag[29] += flag[28]
    # 0x40166f
    flag[14] -= flag[17]
    # 0x40167e
    flag[16] += flag[8]
    # 0x401691
    flag[26] += flag[24]
    # 0x40169c
    flag[8] -= flag[20]
...
    # 0x40283d
    flag[32] += -24709
    # 0x402849
    flag[33] += -32229
    # 0x402855
    flag[34] += -451
    # 0x402861
    flag[35] += -49449

At the very end of all the constraints, we have a bunch of flag[i] -= c equations, so we probably just need it that those values at the end are equal to zero. By our understanding, they actually just have to be non-zero? But it’s probably better if they just landed on zero.

Constraint Solving

Now the thing is we’re not dealing with simple constraints here. You might want to just chuck this into z3, but the problem lies in the fact that once you turn this into a symbolic expression, the values are constantly changing with each operation.

As such, we need to make it a little fugly? Recall that our win condition is that on every equation, the value never drops below zero. We will first create symbolic variables for our input flag:

import z3

orig_flag = [z3.Int(f"f{i}") for i in range(36)]
s = z3.Solver()

for f in orig_flag:
    s.add(f >= 32)
    s.add(f <= 126)

s.add(orig_flag[0] == ord('b'))
s.add(orig_flag[1] == ord('c'))
s.add(orig_flag[2] == ord('t'))
s.add(orig_flag[3] == ord('f'))
s.add(orig_flag[4] == ord('{'))
s.add(orig_flag[-1] == ord('}'))

But for all the assignment/reassignment/constant addition/constant multiplication that we’re doing down the line, we’re going to create a copy of the variables so that we can have the original untouched characters but also the rapidly changing formulae that we’re generating. We accomplish this with a helper function:

flag = list(orig_flag)
def assign(flag, idx, expr, solver):
    flag[idx] = expr
    solver.add(expr >= 0) 

# 0x40161e
assign(flag, 26, flag[26] - flag[13], s)
# 0x40162d
assign(flag, 30, flag[30] + 218, s)
# 0x401646
assign(flag, 28, flag[28] - flag[20], s)
# 0x40165c
assign(flag, 29, flag[29] + flag[28], s)
# 0x40166f
assign(flag, 14, flag[14] - flag[17], s)
...

So when we dump the constraints from our parser, we have it dump out assign function calls, which will help us update the flag expression at each step and, most importantly, adds the constraint that each running expression is >= 0.

Then for our last few constraints, we instead fix them to equal to zero:

def assign2(flag, idx, expr, solver):
    flag[idx] = expr
    solver.add(expr == 0) 

# 0x4026c9
assign2(flag, 0, flag[0] + -19947, s)
# 0x4026d5
assign2(flag, 1, flag[1] + -36338, s)
# 0x4026e1
assign2(flag, 2, flag[2] + -15136, s)
# 0x4026ed
assign2(flag, 3, flag[3] + -17102, s)
# 0x4026f9
assign2(flag, 4, flag[4] + -48722, s)
# 0x402705
assign2(flag, 5, flag[5] + -6273, s)
# 0x402711
assign2(flag, 6, flag[6] + -1602, s)
# 0x402719
assign2(flag, 7, flag[7] + -34918, s)
# 0x402721
assign2(flag, 8, flag[8] + -431, s)
# 0x40272d
assign2(flag, 9, flag[9] + -44585, s)
...
res = s.check()
print(res)

if res == z3.sat:
    model = s.model()
    print("satisfiable")
    result = ''.join([chr(model.eval(c).as_long()) for c in orig_flag])
    print(result)

Yes, I named it assign2 because I was too fucking lazy to batch rename. Anyways, this is enough! We run the z3 script and get… this:

➜  piano python solve.py              
sat
satisfiable
bctf{b#fqQZ*OEQAZKTDAYyl6VPRomVPRom}

Um, a bit of a yucky flag. Regardless, we pass it into the binary, and it clears!

➜  piano ./chal "bctf{b#fqQZ*OEQAZKTDAYyl6VPRomVPRom}"
Correct!

Flag: bctf{b#fqQZ*OEQAZKTDAYyl6VPRomVPRom}.

This is the final parser script and this is the final constraint script.

Conclusion

I really like this challenge. To summarise all the tricks we saw in this one challenge:

I wouldn’t want to speculate what the solve counts would look like in a pre-Codex pre-Clod pre-Gemini world but I’m glad I tackled this challenge the old-school way: by staring at structs for really long and making nonsensical assumptions.

Hopefully you had fun reading this writeup and a big thanks to the Purdue b01lers team for yet another stellar competition!