Let's get rusty! Fast and safe systems programming with Rust

Rust – what is it? Rust is a systems programming language as fast as C and C++, but without the horrible capacity for shooting yourself in the foot.

Imagine: freedom from the horrors of out-of-bounds access, use-after-free, data races – without sacrificing speed!

Not only does Rust guarantee memory safety, but it brings with it the joy of namespaces, trait-based generics, pattern matching and sensible build systems.

For motivation, I present this excerpt from Ian Barland's "Why C and C++ are Awful Programming Languages"

Imagine you are a construction worker, and your boss tells you to connect the gas pipe in the basement to the street's gas main. You go downstairs, and find that there's a glitch; this house doesn't have a basement. Perhaps you decide to do nothing, or perhaps you decide to whimsically interpret your instruction by attaching the gas main to some other nearby fixture, perhaps the neighbor's air intake. Either way, suppose you report back to your boss that you're done.

KWABOOM! When the dust settles from the explosion, you'd be guilty of criminal negligence.

Yet this is exactly what happens in many computer languages. In C/C++, the programmer (boss) can write "house"[-1] * 37. It's not clear what was intended, but clearly some mistake has been made. It would certainly be possible for the language (the worker) to report it, but what does C/C++ do?

  • It finds some non-intuitive interpretation of "house"[-1] (one which may vary each time the program runs!, and which can't be predicted by the programmer), then it grabs a series of bits from some place dictated by the wacky interpretation,
  • it blithely assumes that these bits are meant to be a number (not even a character),
  • it multiplies that practically-random number by 37, and then reports the result, all without any hint of a problem.

In a world where programs control credit-card databases, car brakes, my personal finances, airplanes, and x-ray machines, it is criminal negligence to use a language with the flaws of C/C++. Even for games, browsers, and spreadsheets, the use of C/C++ needlessly helps inflict buggy software on the everyday user.

The writing here is provocative, but for good reason. C has some very real issues: undefined behaviour, weakly enforced typing, notorious lack of memory safety. C++ provides a lot of new features, but is still unsafe, and inherits many of C's problems. Many famous exploits are due to vulnerabilities made possible by these languages. Stagefright is due to integer overflow in C++; Heartbleed was due to C's absence of bounded arrays.

So, why are C and C++ so popular?

I think C and C++ are popular for a variety of reasons. I want to focus on the high level of control they provide, and their runtime efficiency, which are strong motivators for many. For a long time, it has seemed like we have to make a choice between control and safety. Languages like Java use garbage collection to ensure memory safety, but at the cost of control and speed. C and C++ run fast and offer control, but at the expense of safety guarantees.

With Rust, we no longer have to choose between safety and control.

What is Rust?

Rust is an exciting new language, started in 2010 and sponsored by Mozilla, its first stable release was in 2015, last year! It's designed to be safe, concurrent and practical. It is useful for the same kind of applications as C and C++ due to its speed and high level of control.

Rust isn't simply a memory safe version of C++. Rust is its own language with its own design goals, and many features borrowed from functional programming.

To understand Rust and its design, let's move onto Rust's core principles.

Memory safety without garbage collection

Briefly, let's discuss what garbage collection is and why it is slow. Managing resources in a program is about freeing up memory being used by unreachable objects. For example, a variable that has gone out of scope is unreachable.

Garbage collection achieves this management in a variety of ways, often by periodically checking memory to find unused objects, releasing their associated resources and memory.

Garbage collection eliminates double frees, dangling pointers, and most memory leaks. Important achievements! But GC does this at the expense of resources, performance and predictability.

So, how does Rust achieve memory safety without this overhead?

Ownership

A large part of how Rust achieves memory safety is through the concept of ownership. This is an exciting feature, a vital part of Rust that allows it to be both memory-safe and efficient. All objects have an owner at all times, which is tracked by the compiler. The owner can give out references to other users, with some restrictions, or it can transfer the ownership of the variable. Crucially, there can only be one owner. The Rust compiler ensures that there is exactly one binding to any given value at a time.

Ownership in practice

Consider this short code snippet from the Rust docs

    let v = vec![1, 2, 3];
    let mut v2 = v;
    v2.truncate(2);

The first line allocates memory for the vector object v on the stack, and allocates memory on the heap for the data, [1, 2, 3]. Rust copies the address of this heap allocation to an internal pointer, which is part of the vector object.

We have a vector object in one place (the stack), and its data in another (the heap). The two parts must agree at all times, e.g. with regards to length.

Now let's consider the second line of code. Here, we move v to v2. A shallow copy is performed: Rust does a bitwise copy of the vector object v into the stack allocation represented by v2. This shallow copy does not create a copy of the heap allocation containing the vector's actual data.

If v and v2 both point to the same data, what happens if we change that data, as we do in the third line? The vector object v would have out of date information about its data, and become invalid. We would have introduced a data race.

To prevent this occurring, Rust enforces its rule of each value having exactly one binding at any given time. Once we move the vector to v2, v is no longer accessible. Trying to use it will result in a compiler error:

    let v = vec![1, 2, 3];
    let mut v2 = v;
    println!("{:?}", v);
    error[E0382]: use of moved value: `v`
     --> <anon>:4:22
       |
       |2 |     let mut v2 = v;
       |  |         ------ value moved here
       |  3 |     println!("{:?}", v);
       |    |                      ^ value used here after move

Preventing memory problems like the one described above is important: at best, we can get a segmentation fault; at worst, we could allow an unauthorised user to read memory they shouldn't have access to.

Borrowing

Alone, this ownership model is very restrictive; but Rust also provides a mechanism for borrowing, or referencing. Borrowing must follow these rules:

With these restrictions, Rust makes it impossible to have a data race. A data race occurs when two or more pointers access the same memory location at the same time, at least one of them is writing, and the operations aren't synchronised.

We can have as many immutable references as we like without causing a data race, as none of them are writing to the memory location. Or, we can have exactly one write to the memory location, which will never be racing against any other operations.

Memory safety at zero run-time cost

As we have seen, Rust's ownership and borrowing model completely prevents data races. These compile-time checks, along with runtime checks for array bounds &c., ensures complete memory safety. That means freedom from buffer overflow, dangling pointers and use-after-free. Rust's key innovation is enforcing most of these checks at compile time, meaning they do not present any run-time cost^1^. We get memory safety, without overhead. This completely overturns the old trade-off between safety and speed/control.

We have now covered Rust's first core principle; what about the others?

Concurrency without data races

Concurrency has always presented a lot of challenges to programmers, sometimes with devastating consequences. Most of us will know of the famous Therac-25 disaster, where concurrent programming errors (most notably a data race) resulted in radiotherapy patients being given radiation poisoning, and several consequent deaths.

We have already seen how Rust makes data races impossible, already paving the way for fearless concurrency. Concurrent programming in Rust is not something I have much experience in, but I think it's promising that reliable concurrency is a core design principle of Rust.

Abstraction without overhead

Abstraction without overhead is a design principle shared with C++. As C++'s originator put it:

What you don't use, you don't pay for. And further: What you do use, you couldn't hand code any better.

Traits are a large part of how Rust achieves this. "The trait system gives Rust the ergonomic, expressive feel of high-level languages while retaining low-level control over code execution and data representation."

So, what is a trait?

Traits are interfaces: they specify the expectations that one piece of code has on another, allowing each to be switched out independently. Hooray for modularity and flexibility! Traits specify this interface through methods:

    trait Hash {
        fn hash(&self) -> u64;
    }


    impl Hash for bool {
        fn hash(&self) -> u64 {
            if *self {0} else {1}
        }
    }

    impl Hash for i64 {
        fn hash(&self) -> u64 {
            *self as u64
        }
    }

Here, we define a trait called Hash, and say that any type implementing this trait must have the method hash. Later, we implement this trait for the types bool and i64, by defining this method for this particular type.

Traits allow for generic programming:

    fn print_hash<T: Hash>(t: &T) {
        println!("The hash is {}", t.hash())
    }

This function print_hash is generic over type T. It can take any type that implements the Hash trait! As with C++ templates, the compiler will now generate a copy of print_hash for every type implementing the Hash trait. When we actually call t.hash(), at runtime, there is no cost. Abstraction, without overhead!

One of the notable differences between this and C++ templates, is that clients of traits are fully type-checked in advance, once. In C++, the code is checked repeatedly when applied to concrete types. Rust's way means clearer and earlier compilation errors, something template meta-programming in C++ is notorious for its shortcomings in.

The Rust community and tools

I hope I've now given you a good feel for what Rust is like as a language. I think the community and the development tools they have created is also something worth shouting about, and something that gives me hope that Rust will continue to grow both as a language and in popularity.

The Rust community tries hard to foster inclusion and accessibility. Mentoring is strongly encouraged, and facilitated on Github, where easy-to-implement features in Rust's core are tagged to allow new people to come in and learn with the help of someone more experienced.

One of the hardest things for many people learning Rust is its tough learning curve. The ownership model is new for a lot of people, and manual lifetime annotation (something I didn't get onto in this talk) is still something I'm struggling with (I just do what the compiler tells me!). Rust has amazing documentation, and incredibly helpful compilation error messages. The community is working hard to make these even better. The Rust Roadmap for 2017 states that making learning easier is one of their main goals.

Cargo

Cargo, Rust's package management tool, is wonderful. In one simple file, I list the dependencies of my project. Then I run $ cargo build, and the project is built. $ cargo run will run the project's main. $ cargo test will run the tests. It's that simple! No faffing around with Makefiles! Here's an example:

    [package]
    name = "kmeans"
    version = "0.1.0"
    authors = ["Hannah McLaughlin <h@mcla.ug>"]

    [dependencies]
    csv = "0.14"
    rustc-serialize = "0.3"
    generic-array = "0.5.1"

Importantly, Cargo is the one tool used for this. C and C++ suffer from having a wide range of tools for this one task. It's so much easier and cleaner to just have one that works brilliantly, and also makes it simpler for newcomers to figure out how to get set up.

Closing thoughts

I'm really excited about Rust. Writing it is a joy. The guaranteed memory safety is amazing. I can't wait to see what's in Rust's future, and I hope you will consider trying it out if you haven't already.

\ \ \ ^1^A short footnote for the pedants: While some memory safety checks have to be performed at runtime, Rust code can usually be written to avoid any of these checks being needed. For example, array indexing such as array[i] needs to be runtime checked for bounds, but Rust provides iterators that often allow this to be avoided. The result is that most of the time, there is zero-runtime cost for the memory safety Rust provides. ::: :::