A Demonstration of Stagefright-like Mistakes

A Demonstration of Stagefright-like Mistakes

Recent weeks have seen huge amounts of media attention on Stagefright, a C++-based component of the Android operating system, responsible for playing various different multimedia files. The Stagefright exploit (itself discussed in more detail in this article) was fundamentally a result of integer buffer overflows and underflows. While our previous article discussed this in a high level of detail, it has since emerged after the writing of that article, that there is a weakness of the patch made to fix Stagefright. Yes, the patch that’s currently being rolled out to devices, which has already hit Nexus devices in the last few days. So now we need another one.

There’s (some) good news though – the new patch is only a single line change to the source code. You can see the actual patch on AOSP gerrit, which adds yet another bounds-check to an integer variable, to prevent an overflow.

The phrase overflow and underflow keep coming up though, and this article aims to demonstrate, using simple examples that you can follow along at home with, what these issues actually are, and how they work. I’m doing this on Linux, using the GCC C compiler (version 5.2.0 to be precise). You can try to follow along at home if you like, and these results should be reproducible on most systems. If they’re not, or you don’t fancy booting a Linux VM or live CD, feel free to just read along – I’ll show you what code is being compiled, and what happens when it runs.

It’s helpful if you know a little bit of C, but it’s not essential – I’ll explain the basics.

Let’s Allocate Some Memory

Fundamental to almost any piece of software written in C or C++ (the latter is effectively an extension of the former, for our purposes here) is the allocation of memory. To store something, we need memory. To do this, we declare a variable. Here’s an example to declare a variable for an integer (a whole number):

int a = 5;

We are telling the compiler here that we’re going to create an integer. That tells it how much memory we require (i.e. enough to hold a standard-sized integer). We also give it a name (a), so we can refer to it again later. Finally, we set a value into it (5) – this last step is optional. If we don’t do it, we would end up with an uninitialised integer. That’s one whose value is not predictable, as it’s just whatever was last in that memory. Let’s try that out with a quick program:

#include<stdio.h>
int main()
{
    int herpDerp[255];
    int myArray [10];
    for (int i=0; i &lt; 10; i++)
    {
        printf("%d\n", myArray[i]);
    }
    return 0;
}

Here, we’re making a couple of arrays. Arrays are just a sequence of groups of a type. So in this case, an array of integers is just a sequence of several integers, all in a row, with memory allocated contiguously. The first one (called herpDerp) is just being used to take up some space to make the result easier to see. The second array is called myArray, and contains 10 integers. If we were to look in memory at the address of the first integer of myArray, we could move forward 1 address each time, and see the next value within the array. In fact, that’s what the line starting “printf” does – it prints contents from a variable to the screen. Here, we print the value of each integer in the array to the screen.

After compiling this program (using the command gcc -o uninitialised uninitialised.c, and then run the resulting program, we see some garbage.

$ ./uninitialised
-1010740764
32765
0
0
2130837504
0
1642289244
32535
1642288456
32535

What’s interesting though is what happens if we run this again. We’re running the exact same executable here, in the exact same way.

$ ./uninitialised
355598324
32767
0
0
-1090387968
0
-980453284
32522
-980454072
32522

“Look, look, garbage again,” I hear you cry. And you’d be correct – it is indeed more garbage. If you take a closer look though, you’ll see that this garbage is in fact different garbage to what we had the first time. The values are different. This is because the memory was uninitialised, as we never set a value for the integers in myArray. We therefore saw what was previously in that memory, which is meaningless. Each time our program runs, it’s getting a slightly different area of memory, so we see different old contents on each subsequent run of the program.

Let’s Overflow Something…

So far, we’ve just showed that C doesn’t initialise variables by default. Nothing spectacular here. But the principle here is important. Now let’s imagine a slightly more complicated (and useful) program! Let’s imagine we are writing code to handle some kind of number that we’ve asked the user to enter. For simplicity, we will hardcode these numbers into our code, but it is not difficult to have these numbers be entered by the user. Since this is not a C lesson, though, I won’t go into that – if this interests you, have a look at argc and argv parameters available in the main function, and have fun!

Let’s make a program that has been given just 1 thing to do – take a given number of values from one array, and put them into another. In real life, we would be doing something more interesting with these values, but for now we’ll use creative license, and just copy them around. (Note for any eagle-eyed readers – this is not how you would do this in real life. This is very bad, stupid coding practice. There are safe C constructs to use for copying data between arrays. I know that. Please don’t reply and tell me that! I am deliberately using unsafe methods here, to show people how vulnerabilities appear in code, and I want to do that in the simplest way possible, by writing the most vulnerable code we can).

#include<stdio.h>

int main()
{
    int lengthOfData = 5;
    int myArray [5];
    int a = 100;
    for (int i=0; i &lt; lengthOfData+6; i++)
    {
        myArray[i] = i;
    }
    // array is now filled with some content

    for (int i=0; i &lt; lengthOfData; i++)
    {
        printf("%d\n", myArray[i]);
    }

    printf("a = %d\n", a);

    return 0;
}

When we run this code, we would expect to see the numbers 0 through 4 appear, then a line saying “a = 100”. Instead, though, notice how when we are setting the value for ourData[i], this is within a loop that goes up to lengthOfInput+2. That means we are attempting to set the 7th index of ourData, but ourData only has 5 “slots” of memory allocated. What will happen now? Only one way to find out – let’s run it!

$ ./oflow
0
1
2
3
4
5
6
7
8
a = 8
[1]    8144 segmentation fault (core dumped)  ./oflow

Uh oh… That doesn’t look good! We expected to only see the numbers 0 through 4 (5 numbers in total) appear, as our loop to print the values only goes from 0 to lengthOfInput (5). Yet we ended up with 9 values… Huh?

Here, we overflowed an array. When we initialised our array ourData, we only allowed for enough space to hold 5 sequential integers, yet we then went and wrote in 11 integers (see the line stating lengthOfInput+6). We therefore wrote over memory we shouldn’t have touched! That’s bad!

Here, we ended up overwriting two important variables. Firstly, we overwrote the value of “a”, which was set to 100, but came out as 8… Secondly, we overwrote the variable lengthOfData, since we saw too many values displayed. To be precise, it went on for 9 iterations, and as 9 is the number after 8, it stands to reason that the loop counter was being held in the block of memory directly after the variable a. We’ll prove this in the next section.

In the end, our program also crashed with a segmentation fault (or segfault), which is something that usually indicates to security experts that there’s been an overflow or other unexpected condition taking place in the code. Our simple overflow has led to the contents of unrelated memory being changed. Random corruption isn’t interesting though really – sure, it crashes,  but it’s more useful if we can actually change something meaningfully!

Changing Meaningful Data

We’re now going to make a small change to the last example code we used. By changing the code setting the contents of the array as follows, we will be able to put our own value of “a” into the variable.

if (i == 8)
{
    ourData[i] = 0;
}
else
{
    ourData[i] = i;
}

This just means that on the 8th run of the loop, the value “0” will be set. This is going to change the contents of the memory at the address of ourData, +8. That’s 3 addresses ahead of the last “valid” array entry of ourData. And, as we’re about to see, is also the home of the variable “a”.

When we run this program, we now see:

./oflow
0
1
2
3
4
5
6
7
0
a = 0
[1] 8607 segmentation fault (core dumped) ./oflow

And there you have it – we’ve changed the value of variable “a”, by overflowing the value of the array ourData, to put too much data into it.

Similarly, we can alter the lengthOfData value indirectly, by adjusting the value of ourData[9]. Like before, we alter the code to deal with the special case of item 9 in the array (4 beyond the legitimate end):

else if (i == 9)
{
    ourData[i] = 40;
}

If our theory above is correct, and ourData[9] contains the loop counter, we should now see 40 values output to the screen. I’ll leave reproducing this as an exercise for the interested reader, but can confirm I indeed get 40 values printed out. Interestingly as well, those values are numbered up to 40, because we changed the value of lengthOfData within the loop, so the original loop runs for longer, and adjusts more (unallocated) memory. This is bad, because unallocated memory could be given to another program section of the program to use for something else, and we would have the ability to write over it using this code.

How does this relate to Stagefright?

In Stagefright, the issue was that of integer over and underflow. That’s a bit different to array overflow and underflow. But to briefly demonstrate it:

#include<stdio.h>
int main()
{
    unsigned int a = 5;
    int b = 10;
    a = a - b;
    printf("%u\n", a);
    return 0;
}

Here we’re simply doing some basic maths, just like Stagefright does. To work out quantities of memory needed to process videos or other multimedia, Stagefright takes in metadata and does maths on it, to subtract parts the parts that don’t need to be stored, and only store the necessary parts. Above is a simple subtraction sum, where we do 5 – 10. Imagine that we’d managed to put the number 10 here by specially crafting an invalid MP4 or 3GP video file. Or perhaps imagine we’d used an array overflow (from earlier) to change the value of this variable. When we take the unsigned (i.e. positive only) integer 5, and subtract 10 from it, we end up with a problem. C tells me the result is 4294967291. Intuitively we expect a negative answer here, but if we work in unsigned numbers, we end up “wrapping around” and having a very high positive number. The same happens if you add a number to an already-large positive number. You’ll end up with a very small number (and possibly not allocate enough memory). What we just saw here is an integer underflow.

If we modify the above code to read as follows:

unsigned int a = 4294967291;
int b = 10;
a = a + b;

then we end up with the result of 5. If we were expecting a huge amount of data (as the numbers above suggest), and then allocated this memory, we wouldn’t have enough, and we’d end up only allocating 5 units of memory (same numbers as last time, just the other way around). This is an integer overflow.

Pulling it back together

We’ve showed here how an array can be overflowed, and how an integer can be over or under-flowed. While these are fairly abstract concepts, these are the fundamentals of how many security vulnerabilities work. The Stagefright vulnerabilities were quite similar to these.

It’s really difficult to get low-level memory management code right.

The best way to prevent these kinds of attacks is either to use a higher level language, which manages memory for you (albeit with less performance), or to be very, very, very, very careful when coding. More careful than the entirety of the Android security team, for sure. It’s really difficult to get low-level memory management code right. While these examples are somewhat simpler than those you’ll find in real code, the premise is the same. What I showed you here are real security issues in real C code. If you’re interested, go away and try running the examples, and see what you can see. The internet is full of resources about secure coding, and memory overflows etc, so if you are interested in it, there is plenty to read on these topics.

Hopefully this  (relatively brief, in the grand scheme of the topic of secure coding) has given you a taste of just how big a challenge it is to get security right in software. One of the most important ways to get better security is to make code more transparent, by open sourcing it, to allow others to spot (and fix) issues, or to learn from how you worked around issues in your software. Ultimately, the mobile software market depends on consumer confidence, and if we see further breaches and attacks like these, perhaps users will be less willing to buy and install software? Fewer breaches means less bad press, and better sales and downloads of apps. And for anyone thinking “but apps are written in Java, which manages memory for me”, you’d be correct. Except that many non-trivial apps themselves use a native library (often written in C or C++) using the Android NDK (Native Development Kit), either for better performance, or more direct access to hardware and memory. You’re probably running several apps on your phone already, which use natively compiled libraries. How many of them have been properly made available for audit and peer review? And how many of them have an exploitable error within them?

20150819135127788Stagefright has resulted in a lot of focus on rapid security patches for Android, which is a good thing, although this latest bout of fixes for the fixes has showed the difficulty in getting security right, even for a huge company like Google. And that means it’s very likely most developers using native libraries are also making these kinds of mistakes too… Scary, isn’t it?

Finally, remember to use your skills for good, and not for evil. Let’s try to make software safer, and more secure for everyone. Put your knowledge to responsible use, and let’s make things safer for everyone, rather than more dangerous. These last few weeks have seen loads of responsible people find and fix holes in Android. There are, however, most likely many hundreds more like this, just waiting to be found and fixed. Maybe you’ll find the next one?

Discuss This Story

Want more posts like this delivered to your inbox? Enter your email to be subscribed to our newsletter.

READ THIS NEXT