1-introduction.tex

\chapter{Introduction}

\section{The Rust Ownership and Borrow System}

\subsection{Ownership}


In most programming languages, one can freely create and delete resources. For example, in C the programmer allocates memory on the heap with \lstinline{malloc} and frees the memory when they don’t need the memory anymore using \lstinline{free}.

\begin{lstlisting}[language=C, caption={Memory management in C}]
    // Allocate space for an `int`.
    int *ptr = (int*) malloc(sizeof(int));

    // ...

    // Free the memory.
    free(ptr);
\end{lstlisting}

Even in languages without manual memory management, the programmer still must manage other resources. For example, to work with a file in Python one could write:

\begin{lstlisting}[language=Python, caption={Manual resource management in Python}]
    # Open the file and store the file handle in a variable.
    file = open("path/to/some.file", "r")

# ...

# Close the file
file.close()
\end{lstlisting}

\paragraph{Resource Handling Strategies}

This manual resource management comes with some difficulties. The programmer has to watch out to always free the resources after using them, because otherwise the program could leak memory or unneccessarily inhibit other processes from accessing the resources. On the other hand, one may also not free the resources too early, which would lead to a use after free, and also not free resources multiple times. This can be very tricky in complex programs where resources are shared between threads or over \acs{API} boundaries.

That is why several programming languages have ways to make handling resources easier. Python has context managers that close resources automatically.

\begin{lstlisting}[language=Python, caption={Context managers in Python}]
with open("path/to/some.file", "r") as file:
    #...

# At this point `file` is closed.
\end{lstlisting}

This code is equivalent to the previous one, but even better: It also closes the file in case of an exception. Programmers can also define custom context managers for their own resources.

Other languages have different strategies. Go supports the \lstinline{defer} statement which takes an operation that is not to be run immediately but at the end of the scope.

\begin{lstlisting}[language=Go, caption={The \lstinline{defer} statement in Go}]
{ // Block starts here.
    file, err := os.Open("path/to/some.file")
    defer file.Close()

    // ...

} // Block ends here; run the deferred close file
\end{lstlisting}

These prevent double frees and use after free errors, but they can still be forgotten. A programmer who wants to use a resource must be aware that it has to be cleaned up
in the end and changes in one line of the code have to be mirrored in the other. It is not as bad as in manual resource management because the programmer has to change only one additional line and that line is usually close.

\paragraph{\acs{RAII} and Drop Responsibilities}

A third variant is \acfi{RAII}, also called \acfi{SBRM}, used by C++ and Rust. This ensures that the destructor of an object is run when it goes out of scope, which will clean up automatically. An example is the \lstinline{std::unique_ptr} in C++. The pointer \emph{owns} the memory it points to, meaning when it is created, it automatically allocates memory on the heap and when it goes out of scope, it frees the memory.

\begin{lstlisting}[language=C++, caption={\ac{RAII} in C++}]
{ // Block starts here.
    std::unique_ptr<int> ptr = std::make_unique<int>(42);

    // ...

} // Block ends here; run the destructor and free the memory
\end{lstlisting}

\ac{RAII} ensures that resources are always freed when they are not live (can’t be accessed) anymore. The programmer can’t forget to free the resources nor cause use after free or double free. While in C++ only objects conform to \ac{RAII}, in Rust every value is owned by a variable. For example, the Rust analog to an owning pointer is \lstinline{Box}.

\begin{lstlisting}[language=Rust,caption={Heap allocation in Rust}]
{ // Block starts here.
    let ptr = Box::new(42)

    // ...

} // Block ends here; drop value and free memory
\end{lstlisting}

As soon as a variable falls out of scope, it is \emph{dropped}, meaning all
associated resources are freed. It is said that a variable has a \emph{drop responsibility}. A programmer can define custom drop behavior for their own resources by implementing the \lstinline{Drop} trait for their type, but normally the compiler does it for us.

\paragraph{Move Semantics}

\ac{RAII} comes with a few disadvantages, though. Since a value is dropped when its owner goes out of scope, ther must always be exactly one owner for every value. This means that the compiler will transfer ownership sometimes, like in the following example.

\begin{lstlisting}[language=Rust,caption={Ownership transfer},label={lst:ownership-transfer}]
fn print_value(ptr: Box<i32>) {
    println!("{}", *ptr);
}

fn main() {
    let mut ptr = Box::new(42);

    print_value(ptr);

    *ptr += 1;
}
\end{lstlisting}

We create a pointer and use it as an argument to a function that prints the pontee’s value. After that we increment the value. But if we try to compile this, we get an error:

\begin{lstlisting}
error[E0382]: use of moved value: `*ptr`
\end{lstlisting}

Because we supplied \lstinline{ptr} as an argument to \lstinline{print_value}, it took ownership of that value, and now has the drop responsibility. As soon as the call to \lstinline{print_value} is finished, the memory is freed and accessing it afterwards is a use after free. We say that the value was \emph{moved} out of \lstinline{ptr} and into \lstinline{print_value}.

\paragraph{Clone}

We can still make it work by \emph{cloning} \lstinline{ptr} and supply the clone as an argument. \lstinline{ptr.clone()} creates a new \lstinline{Box}, allocates new memory, and initializes it with the same value as \lstinline{ptr}’s:

\begin{lstlisting}[language=Rust, caption={Usage of \lstinline{Clone}}]
fn print_value(ptr: Box<i32>) {
        println!("{}", *ptr);
    }

fn main() {
    let ptr = Box::new(42);

    print_value(ptr.clone());

    *ptr += 1;
}
\end{lstlisting}

Now everything works but keep in mind that the clone is completely independent from the original value. If we changed the value of the clone in \lstinline{print_value}, it would not be visible to the outside.

\paragraph{Copy}

We previously noted that ownership, and therefore move semantics, apply to every value in Rust. But if we change the code to use an \lstinline{i32} directly instead of putting it on the heap, we don’t need to clone anything:

\begin{lstlisting}[language=Rust, caption={Number types are \lstinline{Copy}}]
fn print_value(number: i32) {
        println!("{}", number);
    }

fn main() {
    let mut number = 42;

    print_value(number);

    number += 1;
}
\end{lstlisting}

This compiles and works. That is not because \lstinline{i32} breaks move semantics but because the type implements the \lstinline{Copy} trait. Normally, if a value is moved, the physical bits making up that value are moved to a new location, e. g. the new stack frame, and are not available at the previous position anymore. If a type implements \lstinline{Copy}, the bit pattern is instead copied over and retained. Because \lstinline{i32} doesn’t allocate any heap or handles any resources, such a copy is valid. A \lstinline{Box} is not \lstinline{Copy}, because then two owners for the same resource would exist which would violate drop responsibility. If we want to duplicate a \lstinline{Box}, we need to allocate new memory on the heap and initialize it properly, which is what \lstinline{clone} does.

\paragraph{ManuallyDrop} Sometimes we don’t want \ac{RAII} to happen, we want to free resources ourselves. If that is the case, we can wrap a value in a \lstinline{ManuallyDrop} which tells the compiler to not drop it for us. This is a general trend in Rust: The correct way should be the easiest to do, and all potentially unsafe constructs are opt-in. More information on \lstinline{ManuallyDrop} can be found in the documentation for the type.\footnote{\url{https://doc.rust-lang.org/stable/std/mem/struct.ManuallyDrop.html}}


\subsection{Borrowing}

As we have seen, ownership and move semantics ensure memory safety. But they are not easy to use, \lstinline{clone}ing data has a runtime overhead, and many correct programs are rejected. For example, \autoref{lst:ownership-transfer} describes a program that would work if it were not for move semantics.

That is where references and borrowing come in. Instead of taking ownership of a value, a function can only borrow it through a reference. Then the drop responsibility stays with the caller. References, of course, can not be used for everything, but for our case it is sufficcient. We mark the argument to \lstinline{print_value} as a reference using \lstinline{&}, and creating a reference from a value works the same.

\begin{lstlisting}[language=Rust, caption={Borrow through references to prevent a move}, label={lst:borrow}]
fn print_value(ptr: &Box<i32>) {
    println!("{}", *ptr);
}

fn main() {
    let mut ptr = Box::new(42);

    print_value(&ptr);

    *ptr += 1;
}
\end{lstlisting}

\autoref{lst:borrow} compiles without an error and does what we would expect. There are two kinds of references. We just looked at \emph{shared} or \emph{immutable references}. The other kind is \emph{unique} or \emph{mutable references}, and they are denoted with \lstinline[language=Rust]{&mut}. The different kinds of references have a different semantics attached to them. While both are used to access values without taking ownership, there are specific rules for the creation and guarantees associated with each:

\begin{itemize}
    \item A shared reference to a value can always be created, as long as there is no unique reference to the same value. It is not possible to mutate the pontee through a shared reference (which is why it is also called immutable).
    \item A unique reference can only be created if there is no other reference to the value at all and the referenced value is declared mutable. A unique reference can do everything the owner can, except dropping.
\end{itemize}

The part of the compiler that enforces these rules is the \emph{borrow checker}. The reasoning behind this is again resource safety, namely preventing \emph{unguarded mutable aliasing}. Having multiple readers of the same data doesn’t cause issues, as long as the data cannot be mutated. Mutating data is fine, as long as no one else can read and/or mutate the data at the same time. Using these two kinds of references enforces, at compile time, that we will always stay on the happy path. But there are some programs that are correct even though they violate these rules. We will be concerned with extending the borrow checker to accept more correct programs in later sections.

\paragraph{The Owner of a Borrowed Value}

References have to agree to these rules, but the owner has to as well. While shared references exist, the owner may not mutate, e. g. \lstinline{let mut x = 42; let xref = &x; x += 1} is forbidden. Similarly, the owner can’t access a value at all as long as there is a unique reference around.

\todo[inline]{
    \textbf{This doesn’t not work, I have to think about this some more. I want something to throe Error E0507.}
    
    In particular, a value may not be moved, as long as there is a reference to it because that would create a dangling pointer. This means that \lstinline{let y = &x; let z = *y;} doesn’t work.
    
    %\begin{lstlisting}[language=Rust, caption={Moving out of borrowed value}]
    % let x = 42;
    % let xref = &x;
    % let y = x;
    %\end{lstlisting}
    
}

\paragraph{Lexical Lifetimes}

Before we can discuss extensions to the borrow checker, let’s discuss naïve references. References are values like every other and therefore have an owner themselves. That means if we create a reference to a value, the owner can only access it, after the reference goes out of scope.

\begin{lstlisting}[language=Rust, caption={Lexical Lifetimes Prevent Usage}, label={lst:lexical-lifetimes}]
let mut x = 0;
{
    let xref = &x; // Create a reference to `x`
    x += 1; // Error: Can’t mutate while a reference exists
}
x += 1; // OK, no references exist anymore
\end{lstlisting}

The program in \autoref{lst:lexical-lifetimes} is rejected by our naïve borrow checker since it registers a mutation to \lstinline{x} while a reference to it still exists. Only after the block ends, the borrow is returned to the owner and it can use it freely again. Similarly, we would not be able to create a unique reference while a shared one exists and vice versa. But we can see that the program is correct since \lstinline{xref} is never accessed and therefore no aliasing happens. This code could be saved by wrapping the line in an extra pair of braces but we would like to avoid it. This can be achieved with \acfi{NLL}, which are part of Rust since version 1.31.0.

\paragraph{Access Guards}\todo{Put this in an info box?}

Sometimes we need access to a resource from multiple places at the same time, for example when sharing data between threads. For this, Rust provides the \lstinline{Mutex} container type. References to a mutex can be shared freely, but to change the value in the container, one has to acquire a lock, therefore making the access \emph{guarded}. While the mutex is locked, no other thread can access the data at all, a mutex lock is therefore similar to a \lstinline[language=Rust]{&mut} but its guarantees are enforced at runtime. The \lstinline{RwLock} type can give out both read and write locks which have behavior analogous to \lstinline[language=Rust]{&} and \lstinline[language=Rust]{&mut}, respectively. There are more constructs for similar use cases in the standard library, like \lstinline{Arc} and \lstinline{Cow}.

These data structures are implemented using so-called \emph{raw pointers}. Raw pointers, like pointers in C, are not borrow checked and can therefore alias. The programmer has to check manually that no rules are violated and should provide a safe interface that hides the raw pointers from the users. \todo{Links to Strict Provenance, Miri, …}

\begin{enumerate}
    \item moving out of a borrowed value is illegal (e. g. \lstinline{let y = &x; let z = *y;})
    \item lifetime polymorphism and lifetime elision
          \todo[inline]{borrow-through}
\end{enumerate}
\subsection{Non-Lexical Lifetimes}
\begin{enumerate}
    \item Example: Correct but rejected program
    \item Solution: Early drop of reference/non-lexical lifetimes
\end{enumerate}
\subsection{Reborrows, the Borrow Stack and Two-Phase Borrowing}
\begin{enumerate}
    \item What is a reborrow?
    \item Example: Reborrows are correct but rejected
    \item Solution: Allow for Reborrows
    \item Using shared references after creating a unique one is correct but rejected
    \item Example
    \item Solution: Two-Phase Borrowing
    \item Note: TPB is not yet implemented
          \todo[inline]{Fit in the Borrow Stack}
\end{enumerate}
\subsection{Loans and Regions}
\begin{enumerate}
    \item What are loans and regions?
    \item regions as an alternative to lifetimes
    \item regions are more powerful
    \item Piggyback on the talk by \href{https://www.youtube.com/watch?v=_agDeiWek8w}{Niko Matsakis}
    \item Note: Polonius will be a new Borrowchecker using regions but it is unstable
    \item Note: concrete memory regions à la Cyclone vs abstract regions à la Polonius
\end{enumerate}

\section{Contracts and Refinement Types}
\subsection{Contracts}
\begin{enumerate}
    \item Preconditions, Postconditions, Invariants
    \item Hoare Logic
    \item Contracts are a Runtime Check
    \item Are there first-class contracts?
    \item Higher-order contracts (\href{https://www2.ccs.neu.edu/racket/pubs/icfp2002-ff.pdf}{Findler, Felleisen})
\end{enumerate}
\subsection{Refinement Types}
\begin{enumerate}
    \item Independent Refinement Types
    \item Dependent Refinement Types/Refined Function Types
    \item Refinement Types are “first-class”
\end{enumerate}
\subsection{Liquid Types SMT Solving}
\begin{enumerate}
    \item Refinement Types are not decidable
    \item SAT is decidable
    \item SMT is decidable
    \item Liquid Types = $\{ T : \text{RefinementType } | \text{ decidable } T \}$
\end{enumerate}
\subsection{Hybrid Type Checking}
\begin{enumerate}
    \item Idea: Combine Contracts and Liquid Types
    \item Sprinkle in casts everywhere an assertion is made
    \item If the compiler can prove that this is ok or not ok, remove the cast (and potentially throw a compiler error)
    \item All other casts are dynamic and lead to a panic if they fail.
    \item In Rust, \lstinline{Result} and \lstinline{?} could be used instead. Problem: The error must be handled somewhere.
\end{enumerate}

\section{Other Extensions to a Type System}
\subsection{Totality and Termination Metrics}
\begin{enumerate}
    \item Some functions can panic or loop indefinitely
    \item In Rust, \lstinline{!} as return type helps (such a function does definitely not terminate)
    \item There is no “may not terminate”, though, so you have to rely on the documentation
    \item You can prove that a loop/recursion finishes by termination metrics
    \item Panics can technically occur in every function (every function call needs memory space for a stack frame)
    \item But you can prove that there are no “high-level” panics in any execution path within a function
\end{enumerate}

\subsection{(Other) Substructural Types}
\begin{enumerate}
    \item What are structural rules? Exchange, Weakening, Contraction
    \item Move semantics are affine types
    \item Clone and Copy is weakening
    \item \lstinline{#[must_use]} and relevant types
    \item Linear types
    \item The borrow stack is ordered types
    \item Uniqueness types:  There is only one reference (in contrast to linear types: no more references can be created)
    \todo[inline]{Proof objects? Compile-time only linear values}
\end{enumerate}

\subsection{Purity and (algebraic) Effects}
\begin{enumerate}
    \item Effects
    \item Handling (Example: Exceptions)
    \item Algebraic Effects
    \item Pure functions don’t have unhandled effects
    \item What counts as an effect (IO, allocation, computation)
    \item non-totality is an effect, but do termination metrics fit in?
    \item Weakening and Contraction are effects, but what about borrowing?
    \item Refinements $\cap$ Effects
\end{enumerate}