Dangerous Documentation Part 1–Copying Strings

The standard C runtime library (CRT for short) contains a lot of dangerous functions. Functions such as strcpy can easily lead to buffer overruns and security exploits. Functions such as strncpy can easily end up not null-terminating the destination buffer and this unexpected state can lead to bugs, including buffer overruns and security exploits.

The Microsoft documentation for these CRT functions emphasizes their danger but it does it in such a vague and generic way that is often misleading. In many cases it fails to point out the most serious dangers, and I don’t think it is successful in leading developers to effective solutions to these real problems.

In this post I dissect some documentation, explain what I think the documentation should say, and then demonstrate the one true way to safely copy strings.

MSDN contains multiple copies of the CRT documentation, for versions of Visual Studio from 2003 to 2010. Searching for Microsoft’s CRT documentation (even within MSDN) may lead you to ancient documentation, which may have additional errors. All of my documentation excerpts are from the VS 2010 documentation.

strcpy

The MSDN documentation for many of of these functions has a Security Note. The Security Note for strcpy says:

“Because strcpy does not check for sufficient space in strDestination before copying strSource, it is a potential cause of buffer overruns.“

This security warning is not bad. I think it would be better to say “strcpy cannot check…”, because the lack of a destination size parameter is really the problem, but that’s just splitting hairs. Don’t use this function.

strcpy documentation: A

strncpy

The Security Note for strncpy says:

“strncpy does not check for sufficient space in strDest; it is therefore a potential cause of buffer overruns. Keep in mind that count limits the number of characters copied; it is not a limit on the size of strDest.”

This security warning is well intentioned, but confused. The advantage of strncpy over strcpy is precisely that it does check for sufficient space in the destination. It will write no more characters than the count that you specify. I think that the documentation author got hung up on some imaginary difference between buffer size and count, but to a caller of this function or of strcpy_s they are both just size_t parameters that restrict how many bytes will be copied. As far as preventing buffer overflows goes strncpy and strcpy_s are identical, with one key difference that the security note fails to mention.

Elsewhere in the documentation it says:

“If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string.”

This easy to miss point is the crux of the matter. Strncpy was designed for inserting text into the middle of a string, where null-termination is not desirable. When you use it to copy a string you are risking ending up with a non null-terminated buffer. This piece of information should be in the security note. In fact it should be the security note, after rewriting for clarity:

If the source string does not fit in the destination buffer (if the source buffer including its null terminator are more than ‘count’ characters) then the destination buffer will not be null-terminated.

strncpy documentation: C-, for having a confused security note that misses the one issue that actually matters

Dealing with str(n)cpy – invisible template magic

The problem with strcpy is that it doesn’t constrain the destination size. The problem with strncpy is that it does not necessarily null-terminate the destination buffer. When trying to clean up a code-base that uses strcpy and strncpy your goal should be to fix these two problems in the lowest risk manner.

Microsoft’s CRT offers template overloads of strcpy and strncpy that can infer the destination size reliably. To use these overloads (note – not recommended) just set these two defines to 1:

_CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES
_CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES_COUNT

Again, I don’t recommend this. Now your code looks like standard C/C++ code, calling standard C/C++ functions, but the behavior may be completely changed. Consider this code fragment:

char buffer[10];
strncpy(buffer, “A long string”, ARRAYSIZE(buffer));
buffer[ARRAYSIZE(buffer) – 1] = 0;

This code is verbose, and fragile, but it is legal, correct, and safe code. The long string will be truncated, then null-terminated, and execution will continue. However if the template overloads are enabled then the default safer-CRT policy of truncation-equals-termination is in play. The CRT’s invalid parameter handler will be called when the CRT realizes that ‘buffer’ is not long enough, and your program will be aborted. Killed. Terminated with extreme prejudice. That may not be what you wanted…

And then, consider this code:

char buffer2[10];
char* p2 = buffer2;
strcpy(p2, “A long string”);
strncpy(p2, “Another long string”, ARRAYSIZE(buffer2));

This code shows incorrect usage of strcpy and strncpy. The call to strcpy will overrun the buffer, leading to mayhem. The call to strncpy will fail to null-terminate the buffer and that may lead to additional problems. And none of this behavior is affected by the template overrides.

The template overrides cannot infer the buffer size in the second example because a pointer is used instead of an array. That is why this code is untouched by the secure overload defines. Since the source code looks quite similar in both cases the only way to identify the dangerous code is to look for deprecation warnings (which most people turn off), and there is no obvious sign that some calls to strncpy can now lead to program termination.

In this simple example we find that enabling the secure overloads makes the correct code crash and fails to protect against the dangerous case. This is not a strategy that helps me sleep soundly.

Dealing with str(n)cpy – explicit template magic

The first step towards fixing this mess is to decide what behavior you want. Clearly buffer overruns should be avoided, and buffers that are not null-terminated should also be prevented. But this leaves the policy decision of what should happen when the destination buffer is too small. The safer CRT chooses termination as its policy, and if you want that then you can make your code far more robust by doing a global search/replace of strcpy with strcpy_s. strcpy_s has two signatures – one that infers the destination buffer size through template magic, and one that requires an explicit specification of the buffer size. If you do a global search/replace of strcpy for strcpy_s then you will find that most of your code will still compile. The strcpy_s calls that compile are safe (although they will abort your program if there is insufficient space) and the ones that are not safe will not compile, so you will have to manually fix them by specifying the buffer size explicitly. It’s template magic, but visible in the source code.

Aborting when the buffer size is too small is reasonable in some domains, but in many cases the better bet is truncation. If truncation is the policy that you want, then try this code:

// stringstuff.h
void strcpy_trunc(char* dst,
const char* src, size_t charCount);

template <size_t elemCount>
void strcpy_safe(char (&dst)[elemCount],
_In_z_ const char* src)
{
    strcpy_trunc(dst, src, elemCount);
}

// stringstuff.cpp
void strcpy_trunc(char* dst, const char* src, size_t charCount)
{
    #pragma warning(suppress : 4996) // Suppress deprecation warning
    strncpy(dst, src, charCount);
    dst[charCount-1] = 0;
}

The heavy lifting is done by strcpy_trunc, which is just like strncpy except that it promises to null-terminate the buffer. You can make most code safer with a global search/replace of strncpy for strcpy_trunc.

The real magic is in strcpy_safe. This function uses template magic and an array reference (just like strcpy_s) to infer the destination array size, and then calls strcpy_trunc to do the actual work. Because the programmer does not specify the destination array size it is impossible to get it wrong. If you pass a pointer instead of an array as the destination then the code will not compile. Because the calling signature is identical to strcpy you can switch to this function by doing a search/replace of strcpy to strcpy_safe. Any code that compiles is guaranteed safe. Any places where you are passing a pointer will result in build breaks, and you can fix those by calling strcpy_trunc and manually specifying the size. Unlike the CRT’s automatic template overrides you can see in your code how safe it is.

I think that strcpy_safe is so awesome that it’s even worth the effort to try replacing calls to strncpy with calls to strcpy_safe. It’s a bit more work, because of the parameter count mismatch, but manually specified buffer sizes are wrong often enough to make it worth considering. Using strcpy_safe instead of calling strcpy_trunc explicitly should, for any respectable optimizing compiler, generate identical code.

Strcpy_safe means less typing, great performance, no code bloat, and is guaranteed safe – what’s not to like?

Typical usage of strcpy_safe looks like this:

void Teststrcpy_safe()
{
    char buf[10];
    strcpy_safe(buf, “A long string”);
}

A slightly better version of the implementation is shown here, with annotations to help /analyze find bugs. Recommended.

// stringstuff.h
void strcpy_trunc(_Out_z_cap_(charCount) char* dst,
_In_z_ const char* src, size_t charCount);

template <size_t elemCount>
void strcpy_safe(_Deref_post_z_ char (&dst)[elemCount],
_In_z_ const char* src)
{
    strcpy_trunc(dst, src, elemCount);
}

// stringstuff.cpp
void strcpy_trunc(char* dst, const char* src, size_t charCount)
{
    #pragma warning(suppress : 4996) // Suppress deprecation warning
    strncpy(dst, src, charCount);
    dst[charCount-1] = 0;
}

lstrcpyn

Another option you might consider is lstrcpyn. This function appears to be well designed in order to guarantee truncation, and for a while the only clue of the danger was that the documentation claimed – without explanation – that this function was unsafe and did not guarantee null-termination. It looked like a documentation error, because any reasonable tests would show the destination buffer being null-terminated.

Rather than regurgitate the long lstrcpyn security note I will just encourage you to read it, and marvel at this code below.

void TestlstrcpynA()
{
    char buf[10];
    lstrcpynA(buf, “A long string”, ARRAYSIZE(buf));
    memset(buf, 1, sizeof(buf));
    lstrcpynA(buf, (const char*)0x7, ARRAYSIZE(buf));
}

The first call to lstrcpyn works brilliantly and truncates the string to nine characters plus a null-terminator. It’s perfect! Let’s use it everywhere!

The second call to lstrcpyn intentionally passes an invalid source pointer to demonstrate that when things go wrong, lstrcpyn keeps going. If lstrcpyn hits an access violation then it catches it, and then returns, so that the application can keep going. Instead of crashing when it hits a serious bug it sweeps it under the rug and pretends that all is well! In the old days this ‘technique’ was considered by some to be a good way to make software more robust. It is not. If you are using this technique, rather than finding and fixing bugs, please let me know so that I can avoid using your software.

Don’t use lstrcpyn. It’s not worth it.

Even the lstrcpyn function prototype is suboptimal. Its third parameter is called iMaxLength. This name is sufficiently ambiguous that you cannot tell from the function signature whether you should pass sizeof(dest) or ARRAYSIZE(dest), which is a crucial difference if you end up using the wide-character version of this function. The choice of parameter names is very important in avoiding buffer overruns.

lstrcpyn documentation: A+. It has some flaws, but its brutal honesty in explaining why this function should never be used is refreshing.

Summary

I’m a big fan of Microsoft’s safer CRT (strcpy_s and friends). However I think it has been let down by poor documentation and an overly pessimistic truncation policy.

The safer CRT documentation does a poor job of guiding people towards the simplest and safest fixes. The documentation does a poor job of explaining the weaknesses of strncpy and it does a poor job of explaining the importance of the template overrides for getting truly robust code.

The policy of truncation-equals-termination makes switching to the safer CRT problematic in many domains, since changing the policy requires passing an extra parameter (the _TRUNCATE #define).

Of course, in most domains you shouldn’t be using raw strings in the first place. std::string solves these problems quite nicely. However the techniques demonstrated here – templates and array references to infer buffer sizes – are applicable in many more areas.

About brucedawson

I'm a programmer, working for Google, focusing on optimization and reliability. Nothing's more fun than making code run 10x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Code Reliability, Documentation, Programming, Visual Studio. Bookmark the permalink.

8 Responses to Dangerous Documentation Part 1–Copying Strings

  1. Pingback: Dangerous Documentation Part 2–Printing Strings | Random ASCII

  2. I think you don’t understand the intended usage of strncpy as it wasn’t supposed to receive destination buffer length. Consider the example:

    char dst[] = “blah-blah-blah”;
    strncpy(dst, “Hello world!”, 5);

    Source string will snugly fit in the destination buffer, but we want only the first word. As 5 < strlen("Hello world!"), function will not automatically add NULL terminator. Hence “If the source string does not fit in the destination buffer" is wrong.

  3. brucedawson says:

    Thank you for the comment.

    I do understand the intended usage of strncpy — I wrote “Strncpy was designed for inserting text into the middle of a string”.

    What I was considering was, what do you need to worry about if you try to use strncpy in a common (but originally unintended) manner, as a safer strcpy. If you pass in the size of your buffer, despite the documentation’s dire warnings, buffer overrun is not a problem. A lack of null-termination is a problem. My complaint about the documentation is that it reversed the priority of those two issues.

    When I wrote “If the source string does not fit in the destination buffer” I was assuming the common (if not originally intended) usage of strncpy as a safer strcpy, so I was assuming that _countof(destbuffer) would be passed.

    I think that, pragmatically, most users of strncpy interpret the cound parameter as a buffer size, and there is nothing wrong with that. As long as they remember to null-terminate.

    But strcpy_trunc and strcpy_safe are *so* much better.

  4. Jilles Tjoelker says:

    Another reason why strncpy is bad is that it fills up the remainder of the buffer with ” bytes, while a single ” byte would suffice. If the buffer is usually much larger than necessary (common), this is needlessly inefficient.

  5. RichR says:

    Very well written and researched piece.
    When working with a legacy C application, templates don’t appear to be an possible; is there a sensible solution to get the goodness of strcpy_safe into a legacy app?

    • brucedawson says:

      Upgrade to C++? I don’t know of any other options.

      Even if you don’t use classes, there are many features of C++ that make it safer and easier to use.

      If upgrading to C++ is impossible then one possibility, suggested at GoingNative 2012, is to polish up your resume, and go somewhere where C++ is possible. A bit dramatic perhaps, but worth considering.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s