BLOG | OFFICE OF THE CTO

Beyond the C

Published February 06, 2023

For decades, C and its immediate descendants have been the programming languages of choice for systems programming. In the early 1970s, when C was developed, it was an important step forward from assembly language. Fifty years on, we can do better.

Computer security wasn’t on most people’s radar in 1970, and for many years thereafter. The first major security vulnerability on the internet occurred many years later in 1988. It was known as the internet worm and took advantage of a buffer overflow, where an array in memory was indexed outside its bounds.  

In the C language family, which includes C++, arrays are not bounds-checked. It is up to the programmer to ensure the array is accessed correctly. Buffer overflow errors are therefore common. Worse, it is easy to deliberately cause a buffer overflow and thereby access memory that one should not.

Buffer overflow is just one example of memory unsafety. Other related examples include pointer arithmetic and dangling pointers (aka use-after-free bugs). Today, we have many programming languages which employ a variety of techniques to guarantee that any program written in those languages will be free of memory safety issues. The C family of languages make no such guarantee; memory safety was never a design goal of these languages.

About 70% of security vulnerabilities are due to violations of memory safety. This claim is backed up by overwhelming data. Memory safety accounts for:

  • 67% of exploited vulnerabilities disclosed and detected in the wild (per Google Project Zero 2021 estimate).
  • 90% of Android vulnerabilities (per Google). While this number has been coming down recently, due to the use of Rust, 89% of remote exploitable vulnerabilities remain tied to memory safety.
  • 70% of Microsoft vulnerabilities (per Microsoft).
  • The majority of recent Apple vulnerability fixes (across Catalina, Big Sur, Monterey, Safari, iOS, tvOS, watchOS).

In July 2022, 5/6 of the vulnerabilities fixed in Chrome 103.0.5060.134 were memory safety issues. Clearly, eliminating memory-safety bugs would be enormously helpful. The industry has known how to do that for many years: use memory safe programming languages. Historically, the problem has always been the associated cost in terms of performance. Due to the importance of memory safety, people have been working on new strategies to achieve it without the traditional overhead. Today we know how to eliminate memory safety errors with little to no runtime cost. Languages, such as Rust, use innovations in type system design to guarantee memory safety without costly runtime support.

This leads to an inescapable conclusion: stop writing new systems code in C/C++ (or more generally, in memory unsafe languages).

This is not a call for massive, indiscriminate rewrites of existing code. Replacing existing software is costly and not without risks. However, the industry must stop aggravating the problem by adding more memory-unsafe code to existing codebases. For existing code, prioritize rewriting the most sensitive components: those that are responsible for validating or consuming untrusted user inputs, those running in a privileged context, those running outside of a sandbox, etc.

This position, though widely held, is still controversial to some. Here are some of the common arguments made in favor of the status quo, and our responses to them.

  • Bugs are not specific to programming languages.  
    • Most vulnerabilities result from memory safety bugs. You cannot have memory safety bugs in memory-safe languages such as Rust. Therefore, using a memory-safe language will prevent most vulnerabilities from being created in the first place.
  • Code reviews will catch bugs.
    • Studies show that while code reviews may improve things, they often miss many problems; they certainly don't find everything.
  • Unsafe modules make the whole point moot.
    • Unsafe code is explicitly delineated in very small sections of code, so resources can be focused on validating it.
  • A memory safe language implementation might be buggy.
    • Indeed, but the probabilities are much lower. Compilers are relatively small and rigorously tested. And fixing a compiler will automatically fix all programs compiled with it as opposed to fixing a bug in a single program.
  • Only novice or incompetent programmers make these mistakes.
    • On the contrary, the data above is taken from major systems projects such as the Chrome browser and the Windows and MacOS operating systems. These projects are staffed with some of the most competent and highly experienced developers in the industry, and yet they still exhibit problems with memory safety.
  • These problems can be avoided by following best practices.
    • See above. The teams mentioned follow very rigorous practices and use the best tooling available.
  • It is impractical to rewrite all code.
    • No one is arguing for a complete rewrite of all code. Rather, an approach that focuses on key elements (high privilege, large attack surface, central to security guarantees) and writing new code in a memory safe language is recommended.
  • Performance is inadequate.
    • Rust performance is roughly on par with C/C++. Minor differences do not justify security risks. There are also situations where Rust programs can be faster.
  • Memory-safe systems programming languages like Rust are too new.
  • There are other solutions such as static analysis, fuzzing, and sandboxing.

See Quantifying Memory Unsafety and Reactions to It for a detailed discussion of the above points.

Despite the objections, momentum for the needed changes is growing. Large investments are being made in the Open Source Software Security Foundation (backed by the Linux Foundation) for example. Memory safety has been discussed in US Senate reports and by the NSAConsumer Reports also works to identify the various incentives that could accelerate this movement by offering a series of recommendations for companies and state agencies.

To summarize:

The importance of computing to society has grown enormously over the past fifty years. The threat landscape to our computing infrastructure has also changed radically over recent decades. However, the programming languages we use to build our computing systems have not changed accordingly. Lack of memory safety is the largest single source of security vulnerabilities in software. This is not peculiar to any specific type of software, as everywhere memory-unsafe languages are in use, memory safety issues abound. And by the numbers, no other class of vulnerability comes close.

We now have the means to address this growing problem and it is crucial that we as an industry do so. Fortunately, recognition of the situation is spreading, but the problem is urgent and there is no time to lose.