I keep running into code that uses strcpy, sprintf, strncpy, _snprintf (Microsoft only), wcsncpy, swprintf, and morally equivalent functions. Please stop. There are alternatives which are far safer, and they actually require less typing.
The focus of this post is on fixed-size string buffers, but the technique applies to any type of fixed-length buffer. If you don’t use fixed-size buffers then this post is not relevant to you. Some people do use them, sometimes for valid reasons, and this post is for them.
The dangers of strcpy and sprintf should require little explanation I hope. Neither function lets you specify the size of the output buffer so buffer overruns are often a risk. Using strcpy to copy data from network packets or to copy a large array into a smaller one are particularly dangerous, but even when you’re certain that the string will fit, it’s really not worth the risk.
‘n’ functions considered dangerous
The dangers of strncpy, _snprintf, and wcsncpy should be equally well known, but apparently they are not. These functions let you specify the size of the buffer but – and this is really important – they do not guarantee null-termination. If you ask these functions to write more characters than will fill the buffer then they will stop – thus avoiding the buffer overrun – but they will not null-terminate the buffer. In order to use these functions correctly you have to do this sort of nonsense.
strncpy(buffer, “Thisisalongstring”, sizeof(buffer));
buffer[sizeof(buffer)-1] = 0;
A non-terminated string in C/C++ is a time-bomb just waiting to destroy your code. My understanding is that strncpy was designed for inserting text into the middle of strings, and then was repurposed for ‘secure’ coding even though it is a terrible fit. Meanwhile _snprintf followed the strncpy pattern, but snprintf did not. That is, snprintf guarantees null-termination, but strncpy and _snprintf do not. Is it any wonder that developers get confused? Is it any wonder that developers often do this:
// Make snprintf available on Windows:
// Don’t ever do this! These two functions are different!
#define snprintf _snprintf
strlcpy and lstrcpy
strlcpy is designed to solve the null-termination problems – it always null-terminates. It’s certainly an improvement over strncpy, however it isn’t natively available in VC++.
lstrcpy is a similarly named Microsoft design defect that appears to behave like strlcpy but is actually a security bug. It uses structured exception handling to catch access violations and then return, so in some situations it will cover up crashes and give you a non-terminated buffer. Awesome.
Wide characters worse?
swprintf is a function that defies prediction. It lacks an ‘n’ in its name but it takes a character count, however it doesn’t guarantee null-termination. It’s enough to make one’s head explode.
Where’s the pattern?
If you find the list below obvious or easy to remember then you may be a prodigy, or a liar:
- May overrun the buffer: strcpy, sprintf
- Sometimes null-terminates: strncpy, _snprintf, swprintf, wcsncpy, lstrcpy
- Always null-terminates: snprintf, strlcpy
The documentation for these functions (man pages, MSDN) is typically pretty weak. I want bold letters at the top telling me whether it will null-terminate, but it typically takes a lot of very careful reading to be sure. It’s usually faster to write a test program.
It’s also worth emphasizing that of the seven functions listed above only one is even plausibly safe to use. And it’s not great either.
More typing means more errors
But wait, it’s actually worse. Because it turns out that programmers are imperfect human beings, and therefore programmers sometimes pass the wrong buffer size. Not often – probably not much more than one percent of the time – but these mistakes definitely happen, and ‘being careful’ doesn’t actually help. I’ve seen developers pass hard-coded constants (the wrong ones), pass named constants (the wrong ones), used sizeof(the wrong buffer), or use sizeof on a wchar_t array (thus getting a byte count instead of character count). I even saw one piece of code where the address of the string was passed instead of the size, and due to a mixture of templates and casting it actually compiled! Passing sizeof() to a function that expects a character count is the most common failure, but they all happen – even snprintf and strlcpy are misused. Using annotations and /analyze can help catch these problems, but we can do so much better.
The solution is…
We are programmers, are we not? If the functions we are given to deal with strings are difficult to use correctly then we should write new ones. And it turns out it is easy. Here I present to you the safest way to copy a string to an array:
template <size_t charCount>
void strcpy_safe(char (&output)[charCount], const char* pSrc)
YourCopyNFunction(output, pSrc, charCount);
// Copy the string — don’t copy too many bytes.
//strncpy(output, pSrc, charCount);
// Ensure null-termination.
//output[charCount - 1] = 0;
// Call it like this:
<note>Commenters correctly pointed out that strncpy followed by null-termination is not the ideal implementation of strcpy_safe because it is inefficient (strncpy will zero all bytes to the end of the buffer) and maybe cut UTF-8 characters in half. Fixing this is beyond the scope of this post, which was supposed to be focused on automatically inferring the buffer size through template magic. So, don’t forget to implement YourCopyNFunction – and maybe I’ll post a version next time.</note>
I challenge you to use this function incorrectly. You can make it crash by passing an invalid source pointer, but in many years of using this technique I have never seen a case where the buffer size was not inferred correctly. If you pass a pointer as the destination then, because the size cannot be inferred, the code will fail to compile.
I think that strcpy_safe is (ahem) a perfect function. It is either used correctly, or it fails to compile. It. Is Perfect. And it’s only six lines. Five if you indent like K&R.
Because strcpy_safe is so tiny – it just calls strncpy and then stores a zero – it will inline automatically in VC++ and gcc in optimized builds. If you want to reduce code size further you could write a non-inline helper function (strlcpy?) that would do the null-termination and have strcpy_safe call this function. It’s up to you.
One could certainly debate the name – maybe you would rather call it acme_strcpy, or acme_strncpy_safe. I really don’t care. You could even call it strcpy and let template overloading magically improve the safety of your code.
String truncation causes problems with UTF-8 encoding. If you want to truncate at a character (or is that code-point – I can’t remember) boundaries then need to add some extra code to scan backwards to a character boundary. It’s not too complicated, but it’s outside the scope of this post which was focused on using templates to infer array sizes.
Similar wrappers can obviously be made for all of the string functions that you use. You can even invent new ones, like sprintf_cat_safe. In fact, when I write a member function that takes a pointer and a size I usually make it private and write a template wrapper to handle the size. It’s a versatile technique that you should get used to using. Templates aren’t just for writing unreadable meta-code.
Yes, to be clear, I am aware of the existence of std::string. For better or for worse most game developers try to avoid dynamically allocated memory and std::string generally implies just that. There are valid reasons to use character arrays (fewer allocations, better cache locality), even if those valid reasons are just that you’ve been handed a million lines of legacy code with security and reliability problems in all directions. strcpy_safe and friends are unique in that they let you improve code security and reliability with a simple s/strcpy/strcpy_safe/.
As I say at the top, if you don’t need to use fixed-length buffers then congratulations, this post does not apply to you.
We started with this code – two lines of error prone verbosity:
strncpy(buffer, “Thisisalongstring”, sizeof(buffer));
buffer[sizeof(buffer)-1] = 0;
We ended with this code – simpler, and impossible to get wrong:
You should be using the second option, or explain to me why the heck not. It is constructive laziness of the highest order. Doing less typing in order to create code that is less fragile seems like it just might be a good idea.
One could certainly argue that silently truncating is bad – if that is your policy then your wrapper function can assert, log, or abort(). Those are implementation specific details. However we can all agree that avoiding buffer overruns is progress.