Stop using strncpy already!

I keep running into code that uses strcpy, sprintf, strncpy, _snprintf (Microsoft only), wcsncpy, swprintf, and morally equivalent functions. Please stop. There are alternatives which are far safer, and they actually require less typing.

The focus of this post is on fixed-size string buffers, but the technique applies to any type of fixed-length buffer. If you don’t use fixed-size buffers then this post is not relevant to you. Some people do use them, sometimes for valid reasons, and this post is for them.

The dangers of strcpy and sprintf should require little explanation I hope. Neither function lets you specify the size of the output buffer so buffer overruns are often a risk. Using strcpy to copy data from network packets or to copy a large array into a smaller one are particularly dangerous, but even when you’re certain that the string will fit, it’s really not worth the risk.

‘n’ functions considered dangerous

The dangers of strncpy, _snprintf, and wcsncpy should be equally well known, but apparently they are not. These functions let you specify the size of the buffer but – and this is really important – they do not guarantee null-termination. If you ask these functions to write more characters than will fill the buffer then they will stop – thus avoiding the buffer overrun – but they will not null-terminate the buffer. In order to use these functions correctly you have to do this sort of nonsense.

char buffer[5];
strncpy(buffer, “Thisisalongstring”, sizeof(buffer));
buffer[sizeof(buffer)-1] = 0;

A non-terminated string in C/C++ is a time-bomb just waiting to destroy your code. My understanding is that strncpy was designed for inserting text into the middle of strings, and then was repurposed for ‘secure’ coding even though it is a terrible fit. Meanwhile _snprintf followed the strncpy pattern, but snprintf did not. That is, snprintf guarantees null-termination, but strncpy and _snprintf do not. Is it any wonder that developers get confused? Is it any wonder that developers often do this:

// Make snprintf available on Windows:
// Don’t ever do this! These two functions are different!
#define snprintf _snprintf

strlcpy and lstrcpy

strlcpy is designed to solve the null-termination problems – it always null-terminates. It’s certainly an improvement over strncpy, however it isn’t natively available in VC++.

lstrcpy is a similarly named Microsoft design defect that appears to behave like strlcpy but is actually a security bug. It uses structured exception handling to catch access violations and then return, so in some situations it will cover up crashes and give you a non-terminated buffer. Awesome.

Wide characters worse?

swprintf is a function that defies prediction. It lacks an ‘n’ in its name but it takes a character count, however it doesn’t guarantee null-termination. It’s enough to make one’s head explode.

Where’s the pattern?

If you find the list below obvious or easy to remember then you may be a prodigy, or a liar:

    • May overrun the buffer: strcpy, sprintf
    • Sometimes null-terminates: strncpy, _snprintf, swprintf, wcsncpy, lstrcpy
    • Always null-terminates: snprintf, strlcpy

The documentation for these functions (man pages, MSDN) is typically pretty weak. I want bold letters at the top telling me whether it will null-terminate, but it typically takes a lot of very careful reading to be sure. It’s usually faster to write a test program.

It’s also worth emphasizing that of the seven functions listed above only one is even plausibly safe to use. And it’s not great either.

More typing means more errors

But wait, it’s actually worse. Because it turns out that programmers are imperfect human beings, and therefore programmers sometimes pass the wrong buffer size. Not often – probably not much more than one percent of the time – but these mistakes definitely happen, and ‘being careful’ doesn’t actually help. I’ve seen developers pass hard-coded constants (the wrong ones), pass named constants (the wrong ones), used sizeof(the wrong buffer), or use sizeof on a wchar_t array (thus getting a byte count instead of character count). I even saw one piece of code where the address of the string was passed instead of the size, and due to a mixture of templates and casting it actually compiled! Passing sizeof() to a function that expects a character count is the most common failure, but they all happen – even snprintf and strlcpy are misused. Using annotations and /analyze can help catch these problems, but we can do so much better.

The solution is…

We are programmers, are we not? If the functions we are given to deal with strings are difficult to use correctly then we should write new ones. And it turns out it is easy. Here I present to you the safest way to copy a string to an array:

template <size_t charCount>
void strcpy_safe(char (&output)[charCount], const char* pSrc)
{
    YourCopyNFunction(output, pSrc, charCount);
    // Copy the string — don’t copy too many bytes.
    //strncpy(output, pSrc, charCount);
    // Ensure null-termination.
    //output[charCount - 1] = 0;
}

// Call it like this:
char buffer[5];
strcpy_safe(buffer, “Thisisalongstring”);

The syntax is a bit weird since it combines templating on an integral value (instead of a type) with passing an array by reference, both of which are unfamiliar to many programmers. For more information on passing arrays by reference see this stack overflow article. Or, you can use the template magic quite effectively without worrying about the details of how it works.

<note>Commenters correctly pointed out that strncpy followed by null-termination is not the ideal implementation of strcpy_safe because it is inefficient (strncpy will zero all bytes to the end of the buffer) and maybe cut UTF-8 characters in half. Fixing this is beyond the scope of this post, which was supposed to be focused on automatically inferring the buffer size through template magic. So, don’t forget to implement YourCopyNFunction – and maybe I’ll post a version next time.</note>

I challenge you to use this function incorrectly. You can make it crash by passing an invalid source pointer, but in many years of using this technique I have never seen a case where the buffer size was not inferred correctly. If you pass a pointer as the destination then, because the size cannot be inferred, the code will fail to compile. It only works with a static string buffer as destination (no std::string, or std::vector) but different overloads can be made for those destination types.

I think that strcpy_safe is (ahem) a perfect function. It is either used correctly, or it fails to compile. It. Is Perfect. And it’s only six lines. Five if you indent like K&R.

Because strcpy_safe is so tiny – it just calls strncpy and then stores a zero – it will inline automatically in VC++ and gcc in optimized builds. If you want to reduce code size further you could write a non-inline helper function (strlcpy?) that would do the null-termination and have strcpy_safe call this function. It’s up to you.

One could certainly debate the name – maybe you would rather call it acme_strcpy, or acme_strncpy_safe. I really don’t care. You could even call it strcpy and let template overloading magically improve the safety of your code.

Unicode

String truncation causes problems with UTF-8 encoding. If you want to truncate at a character (or is that code-point – I can’t remember) boundaries then need to add some extra code to scan backwards to a character boundary. It’s not too complicated, but it’s outside the scope of this post which was focused on using templates to infer array sizes.

Extrapolation

Similar wrappers can obviously be made for all of the string functions that you use. You can even invent new ones, like sprintf_cat_safe. In fact, when I write a member function that takes a pointer and a size I usually make it private and write a template wrapper to handle the size. It’s a versatile technique that you should get used to using. Templates aren’t just for writing unreadable meta-code.

String classes

Yes, to be clear, I am aware of the existence of std::string. For better or for worse most game developers try to avoid dynamically allocated memory and std::string generally implies just that. There are valid reasons to use character arrays (fewer allocations, better cache locality), even if those valid reasons are just that you’ve been handed a million lines of legacy code with security and reliability problems in all directions. strcpy_safe and friends are unique in that they let you improve code security and reliability with a simple s/strcpy/strcpy_safe/.

As I say at the top, if you don’t need to use fixed-length buffers then congratulations, this post does not apply to you.

Summary

We started with this code – two lines of error prone verbosity:

strncpy(buffer, “Thisisalongstring”, sizeof(buffer));
buffer[sizeof(buffer)-1] = 0;

We ended with this code – simpler, and impossible to get wrong:

strcpy_safe(buffer, “Thisisalongstring”);

You should be using the second option, or explain to me why the heck not. It is constructive laziness of the highest order. Doing less typing in order to create code that is less fragile seems like it just might be a good idea.

One could certainly argue that silently truncating is bad – if that is your policy then your wrapper function can assert, log, or abort(). Those are implementation specific details. However we can all agree that avoiding buffer overruns is progress.

The reddit discussions of this post are here and here.

About these ads

About brucedawson

I'm a programmer, working for Valve (http://www.valvesoftware.com/), focusing on optimization and reliability. Nothing's more fun than making code run 5x faster. Unless it's eliminating large numbers of bugs. I also unicycle. And play (ice) hockey. And juggle.
This entry was posted in Code Reliability, Visual Studio and tagged , , , , . Bookmark the permalink.

50 Responses to Stop using strncpy already!

  1. MSpike says:

    Well, your strcpy_safe is very nice, I see only two issues here:
    – strncpy and friends usually used in c programs not in c++, so if someone still uses strncpy in c++… well, they must not be c++ programmers, but c programmer.
    – the second thing, which I’m not sure about, that using template means that it will “generate” one function for every different kind of calls, isn’t it ? So if used heavily the code base can grow significantly ?!

    What do you think ?

    I like the article, in fact ! Pretty good one.

    • brucedawson says:

      C++ programmers still have valid uses for string arrays and they need to use something to fill them — and the standard library doesn’t have great options. What would you have them use?

      Code bloat shouldn’t be an issue — the template function will inline and be no more expensive than doing the strncpy/manual-termination dance. If the template function called strlcpy or some equivalent then it would be perfectly efficient. I updated the post to address that issue — thanks for mentioning it.

      • MSpike says:

        Well, I’m still not convinced about the second thing (you may validate the usage of these, I would try to avoid it anyway, if possible in c++)
        but for code bloat:

        (gdb) disassemble /m
        Dump of assembler code for function main():
        13 {
        0×00401170 : push %ebp
        0×00401171 : mov %esp,%ebp
        0×00401173 : and $0xfffffff0,%esp
        0×00401176 : sub $0×20,%esp
        0×00401179 : call 0×401220

        14 // Call it like this:
        15 char buffer[5];
        16 strcpy_safe(buffer, “Thisisalongstring”);
        => 0x0040117e : movl $0×402080,0×4(%esp)
        0×00401186 : lea 0x1b(%esp),%eax
        0x0040118a : mov %eax,(%esp)
        0x0040118d : call 0×401740 <strcpy_safe(char (&) [5u], char const*)>

        17 char buffer2[6];
        18 strcpy_safe(buffer2, “Thisisalongstring”);
        0×00401192 : movl $0×402080,0×4(%esp)
        0x0040119a : lea 0×15(%esp),%eax
        0x0040119e : mov %eax,(%esp)
        0x004011a1 : call 0x40176c <strcpy_safe(char (&) [6u], char const*)>

        19 return 0;
        0x004011a6 : mov $0×0,%eax

        20 }
        0x004011ab : leave
        0x004011ac : ret

        End of assembler dump.

        strcpy_safe and strcpy_safe seems to be two different function for me. I’m just wondering if you can write something without templates, to workaround the size question… Funny that sometimes “trivial” problems are’t trivial. :)

        • brucedawson says:

          Comment deleted — gcc inlines fine with optimizations enabled, although it complicates testing (of course) because code is optimized away.

        • Paul Khuong says:

          Recompile with optimisations. Unoptimised C++ code is really different from even minimally optimised one; the trick of declaring a public copy constructor without defining it, and just depend on (N)RVO, tends to break, for instance. On both clang++ 4.0 and g++ 4.2, plain -O suffices to inline the template functions, and replace them with a single-pass initialisation from literals. In fact, the functions are recognized as pure and the issue is to prevent the buffers from being completely elided. (passing them to a dummy extern function works).

          • Paul Khuong says:

            Interesting change: -O2 is necessary for the same result on g++ 4.7. Regarding code bloat, however, the instantiated functions are marked as weak, and any copy would be merged at link-time.

          • brucedawson says:

            Blah. You’re right of course. I passed the wrong switch to gcc. gcc inlines fine. Subverting the optimizations enough to get the desired behavior is always an interesting challenge — dummy extern functions FTW (just don’t use LTO).

  2. Jonathan says:

    strlcpy?

    • brucedawson says:

      I forgot about strlcpy. It does guarantee null-termination, but it can still be misused by passing the wrong buffer size. On platforms that have strlcpy you should get strcpy_safe to call it. I’ve updated the post. Thanks.

      (edited to fix critical error — I wrote “doesn’t” when I meant “does”)

      • John says:

        I’m not sure what strlcpy() version you used before, but all good implementations, especially on *BSD’s, guarantee null-termination (http://www.manpagez.com/man/3/strlcpy/).

        • John says:

          ….and your code doesn’t work for pointers.

          • brucedawson says:

            That’s correct. That is a feature, not a bug. Automatic size inference won’t work for pointers so the code fails to compile so the developer has to manually solve that situation. Make the easy stuff easy, and save the effort for the trickier stuff.

            Also, if you have a pointer you presumably have allocated memory, so use std::string.

  3. Anonymous says:

    ehm.. empty array?

  4. gsg says:

    strcpy_safe is also badly designed because it truncates but gives no way to detect truncation. Truncation is a *bug*, and any string routine that truncates without informing the caller is simply broken. Truncating and informing is (barely) acceptable, although awfully primitive.

    If dynamic allocation is a concern, I would prefer a string library that backs strings with user-sizable static buffers, falling back on dynamic allocation only when necessary. Hopefully the tunable buffers would allow dynamic allocation to be avoided in enough cases that performance would be acceptable.

    But if you really must resort throwing away part of the information the caller wanted, for God’s sake don’t do it silently.

    Anonymous: zero sized arrays are illegal in C++. It might be wise to guard against that case anyway though (ie, gcc allows it as an extension).

    • brucedawson says:

      Fair point. Microsoft’s strcpy_s defaults to terminating your program on truncation, so they certainly agree that truncation is a bug.

      I absolutely agree that dynamic allocation is the correct solution in many cases, but sometimes it’s just not practical.

      It’s also worth noting that in many cases truncation *should* never happen and using these _safe functions is for protection against malicious data.

  5. ratatask says:

    char buf[1024];
    strncpy(buf, “Hello”, sizeof buf);

    Will copy “Hello” (including the nul terminator) to buf[0] through buf[5]
    sets buf[6] through buf[1023] to 0. I.e. quite a lot of unneeded overhead.

    • brucedawson says:

      That is correct. The zeroing of the extra bytes can be a problem. You could always implement strcpy_safe without using strncpy in order to avoid this problem.

  6. Why can’t this simply be shortened to “use snprintf” (and if you’re stuck on Visual studio) use something like: #define snprintf(a, b, c, …) _snprintf_s(a, b, _TRUNCATE, c, __VA_ARGS__)? Use the standard, Luke.

  7. Piotr says:

    Setting aside truncation,

    buf[0] = 0;
    strncat(buf, source, sizeof(buf) – 1);

    • brucedawson says:

      I guess terminating and then using strncat is another way of implementing strcpy_safe, but it is certainly to inconvenient for using in leaf code. Any string copying solution that requires two lines of code and manual specification of the buffer size is dead-on-arrival.

  8. Sevki says:

    uhm, stop using c strings or std::strings in your case is more like it mate, use bstring (http://bstring.sourceforge.net/) or facebook string(https://github.com/facebook/folly/blob/master/folly/docs/FBString.md#follyfbstringh), write your own string lib or what ever… but stop using null terminated.
    Please do “#define byte char” as a gesture of what you intend on using char for. and strart thinking about chars in that term… c strings in most plain terms are not good for unicode, hey even something like wchar.h would make the situation better, so you don’t get internationalization problems like turkish keyboard people crashing your apps… so stop using std::string is more like it

    • brucedawson says:

      Why “#define byte char”? The preprocessor should be avoided as much as possible, and even last century C programmers can use a typedef for that.

      • brucedawson says:

        Good point about UNICODE — adding proper UTF-8 truncation support to strcpy_safe would be a good idea. I’ve decided not to do it because it would complicate the article but I’ve added a note pointing out that issue.

    • Instead of using static size buffers (which can silently truncate input), why not use something like the GNU asprintf extension? It can easily be implemented in standard C using va_copy.


      char *asprintfv (char *format, va_list args) {
      va_list copy_args;
      va_copy (copy_args, args);
      int size = vsprintf (NULL, format, args);
      char *buffer = malloc (size);
      if (!buffer)
      return NULL;
      vsprintf (buffer, format, copy_args);
      return size;
      }

      Let’s hope this code sample comes through WordPress alright.

    • brucedawson says:

      strcpy_s has two potential disadvantages. One is the lack of portability, and the other is that strcpy_s’s policy is to terminate your application if the string won’t fit. To get truncation you need to use strncpy_s with the _TRUNCATE flag, and I think that truncation needs to be easier than that.

      On the other hand, strcpy_s is what taught me the template array-size-inference trick.

  9. Alan says:

    I always wondered why there were so many versions of the function. Little did I know I was walking on thin ice, even when using the functions that I thought were safe. Thanks for sharing this!

  10. Peter Ferrie says:

    “lstrcpy is a similarly named Microsoft design defect that appears to behave like strlcpy but is actually a security bug”. Yes, but it comes from 16-bit Windows (i.e. 1992 or even earlier), and its behaviour is retained for compatibility, so that’s a bit unfair.

    • brucedawson says:

      Well, the lstrcpy design was broken when it was first written, so they have to take some responsibility for it. When they moved to 32-bit they could have removed it or fixed the design without fear of compatibility problems, but they didn’t. When they moved to 64-bit they could have removed it or fixed the design without fear of compatibility problems, but they didn’t.

      Anytime that developers have to recompile their code you should take that as an opportunity to fix broken designs, so I don’t think it’s totally unfair.

      And, fair or not, it is true that it is a security bug function, and developers need to know that. Mincing words would do nobody any good.

  11. bill says:

    why not this :

    template
    void strcpy_safe(char (&output)[charCount], const char* src) {

    for (size_t i = 0; i < charCount; ++i) {
    output[i] = src[i];
    if (src[i] == 0) break;
    }

    }

    • brucedawson says:

      updated:
      I used strncpy just to keep the implementation minimal, but your code is conceptually reasonable, but it doesn’t appear to guarantee null-termination, and it also fails to deal with UTF-8. Strings are hard.

    • nikbackm says:

      Will that nul-terminate the buffer if the src string is too long?

      • brucedawson says:

        It’s hard to be sure since some code got trimmed out in the conversion to HTML, but it looks like ‘no’.

        • bill says:

          yeah there is a bug :) sorry
          the real limitation is that it can’t handle UTF-8 strings..
          (but strncpy can do that?)

          for (size_t i = 0; i < charCount; ++i) {
          output[i] = src[i];
          if (src[i] == 0 || i == (charCount-1)) {
          output[i] = 0;
          break;
          }
          }

  12. Gigi says:

    Funny how you go on criticizing all the Microsoft variants, yet you fail to mention anything about strcpy_s, especially since you say your solution was inspired by it.

    • brucedawson says:

      I should have mentioned strcpy_s — I have in the past — but I was trying to keep the article as short as possible. Also, I find that a lot of developers don’t like the default termination-on-overflow behavior of strcpy_s, and invoking truncation takes away much of the elegance. And it’s not portable, regrettably.

      I don’t think I singled out Microsoft for criticsm particularly — swprintf is a POSIX variant I believe and it is particularly odd.

  13. Martin says:

    nice post. Actually struggling with some legacy code with that problem. It’s in pure C though, so can unfortunately not use your template trick. Will however create a strcpy_safe function to make sure it’s at least prettier.

    • brucedawson says:

      I prefer reserving the _safe function for functions that infer the buffer size. Perhaps better to use some other suffix or prefix? We’re considering going with _ptr for when you have a pointer (or otherwise have to specify the size) and _safe for when the size is inferred.

      And of course both variants will always null-terminate.

  14. Mihnea Balta says:

    If I recall correctly, the reason why strncpy() is so stupid isn’t that it was meant for inserting in the middle of strings. That wouldn’t explain the other dumb thing it does, i.e. fill the remainder of the destination with zeroes. No, it’s much more horrible than that. Back in the stone age, there was this filesystem with fixed-size file names. The open() function expected a 12-character buffer (or so), and it compared all the characters in that buffer against the file name on disk (no null terminator, no name length field). So, in order to expose variable-length file names to the user, they used the convention that buffers had to be padded with 0, and they made a little function to help you do this: strncpy(). When you wanted to open a file, you did this:

    strncpy(buffer, fileName, 11);
    int fd = open(buffer);

    I think it’s insulting that someone took this hack and put it into the standard library. Then again, the standard library contains strcpy() and sprintf(), so…

    Also, I think people should stop mentioning the _s functions that Microsoft inflicted upon the world. None of the programmers who told me to use them actually knew what they do, because nobody bothers clicking on the inconspicuous “Parameter Validation” link in the documentation. Almost everybody thinks they are just the old functions with an added size parameter, but as you said, they aren’t. I’ve seen plenty of people use them on user input, expecting truncation and saying how they avoided a crash (without installing an invalid parameter handler, of course, because THEY DON’T KNOW ABOUT IT). It’s funny to see the feeling of betrayal which they get when you make them actually test the code with an oversize buffer, and it crashes (granted, a “secure” crash, not an exploitable buffer overflow, but still a denial of service).

    • brucedawson says:

      Whatever you do, don’t install an invalid parameter handler. That seems like a really bad way to use the _s functions.

      When I was at Microsoft I tried to convince them that they needed to make the truncation policy more accessible. Having to pass an extra parameter makes it too inconvenient, it should just be different function names — _st suffix instead of _s?

  15. Matt says:

    > The documentation for these functions (man pages, MSDN) is typically pretty weak. I want bold letters at the top telling me whether it will null-terminate, but it typically takes a lot of very careful reading to be sure.

    Try “Secure Coding in C and C++, 2nd Edition.”
    [PDF] http://ptgmedia.pearsoncmg.com/images/9780321822130/samplepages/0321822137.pdf
    In particular:
    Table 2.5 String Copy Functions
    Table 2.6 String Concatenation Functions
    Table 2.7 Truncating Copy Functions
    Table 2.8 Truncating Concatenation Functions

    HTH!

  16. Pingback: Monitoring strcpy with BPHook in the Immunity Debugger - 4bytes.co.uk

  17. Pingback: Two Years (and Thousands of Bugs) of Static Analysis | Random ASCII

  18. So C11 (9899:2011) gives specifications for most or all of the ‘_s’ variants of the string functions, they are part of the official C11 standard but only as the set of optional extensions represented by the conditionally defined macro __STDC_LIB_EXT1__ … though the ‘_s’ versions of the string functions are not part of C++11 (14882:2011). I guess that would mean anyone wanting to know the exact behavior of the ‘_s’ version of a string function could reference the C11 specification, though there is no guarantee the Microsoft version of the function will behave the same as the C11 standard version (since the MSVC compiler is not C11 compliant).

    • brucedawson says:

      I checked the C++14 draft and couldn’t find references to the ‘_s’ functions — where are they specified?

      I think it would be good to have them widely available. They aren’t perfect, but being able to write “strcpy_s(buffer, pEvil);” and be safe from buffer overruns, or “strncpy_s(buffer, pEvil, _TRUNCATE);” to be safe with truncation would be great. Until then I’ll stick with my own template versions, which are at least simple enough to write.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s