Newer
Older
\chapter{Introduction}
\section{The Rust Ownership and Borrow System}
\subsection{Ownership}
In most programming languages, one can freely create and delete resources. For example, in C the programmer allocates memory on the heap with \lstinline{malloc} and frees the memory when they don’t need the memory anymore using \lstinline{free}.
\begin{lstlisting}[language=C, caption={Memory management in C}]
// Allocate space for an `int`.
int *ptr = (int*) malloc(sizeof(int));
Even in languages without manual memory management, the programmer still must manage other resources. For example, to work with a file in Python one could write:
\begin{lstlisting}[language=Python, caption={Manual resource management in Python}]
# Open the file and store the file handle in a variable.
file = open("path/to/some.file", "r")
# ...
# Close the file
file.close()
\end{lstlisting}
\paragraph{Resource Handling Strategies}
This manual resource management comes with some difficulties. The programmer has to watch out to always free the resources after using them, because otherwise the program could leak memory or unneccessarily inhibit other processes from accessing the resources. On the other hand, one may also not free the resources too early, which would lead to a use after free, and also not free resources multiple times. This can be very tricky in complex programs where resources are shared between threads or over \acs{API} boundaries.
That is why several programming languages have ways to make handling resources easier. Python has context managers that close resources automatically.
\begin{lstlisting}[language=Python, caption={Context managers in Python}]
with open("path/to/some.file", "r") as file:
#...
# At this point `file` is closed.
\end{lstlisting}
This code is equivalent to the previous one, but even better: It also closes the file in case of an exception. Programmers can also define custom context managers for their own resources.
Other languages have different strategies. Go supports the \lstinline{defer} statement which takes an operation that is not to be run immediately but at the end of the scope.
\begin{lstlisting}[language=Go, caption={The \lstinline{defer} statement in Go}]
file, err := os.Open("path/to/some.file")
defer file.Close()
// ...
} // Block ends here; run the deferred close file
\end{lstlisting}
These prevent double frees and use after free errors, but they can still be forgotten. A programmer who wants to use a resource must be aware that it has to be cleaned up
in the end and changes in one line of the code have to be mirrored in the other. It is not as bad as in manual resource management because the programmer has to change only one additional line and that line is usually close.
\paragraph{\acs{RAII} and Drop Responsibilities}
A third variant is \acfi{RAII}, also called \acfi{SBRM}, used by C++ and Rust. This ensures that the destructor of an object is run when it goes out of scope, which will clean up automatically. An example is the \lstinline{std::unique_ptr} in C++. The pointer \emph{owns} the memory it points to, meaning when it is created, it automatically allocates memory on the heap and when it goes out of scope, it frees the memory.
\begin{lstlisting}[language=C++, caption={\ac{RAII} in C++}]
{ // Block starts here.
std::unique_ptr<int> ptr = std::make_unique<int>(42);
// ...
} // Block ends here; run the destructor and free the memory
\end{lstlisting}
\ac{RAII} ensures that resources are always freed when they are not live (can’t be accessed) anymore. The programmer can’t forget to free the resources nor cause use after free or double free. While in C++ only objects conform to \ac{RAII}, in Rust every value is owned by a variable. For example, the Rust analog to an owning pointer is \lstinline{Box}.
\begin{lstlisting}[language=Rust,caption={Heap allocation in Rust}]
{ // Block starts here.
let ptr = Box::new(42)
// ...
} // Block ends here; drop value and free memory
\end{lstlisting}
As soon as a variable falls out of scope, it is \emph{dropped}, meaning all
associated resources are freed. It is said that a variable has a \emph{drop responsibility}. A programmer can define custom drop behavior for their own resources by implementing the \lstinline{Drop} trait for their type, but normally the compiler does it for us.
\paragraph{Move Semantics}
\ac{RAII} comes with a few disadvantages, though. Since a value is dropped when its owner goes out of scope, ther must always be exactly one owner for every value. This means that the compiler will transfer ownership sometimes, like in the following example.
\begin{lstlisting}[language=Rust,caption={Ownership transfer}]
fn print_value(ptr: Box<i32>) {
println!("{}", *ptr);
}
fn main() {
let ptr = Box::new(42);
print_value(ptr);
*ptr += 1;
}
\end{lstlisting}
We create a pointer and use it as an argument to a function that prints the pontee’s value. After that we increment the value. But if we try to compile this, we get an error:
error[E0382]: use of moved value: `*ptr`
\end{lstlisting}
Because we supplied \lstinline{ptr} as an argument to \lstinline{print_value}, it took ownership of that value, and now has the drop responsibility. As soon as the call to \lstinline{print_value} is finished, the memory is freed and accessing it afterwards is a use after free. We say that the value was \emph{moved} out of \lstinline{ptr} and into \lstinline{print_value}.
\paragraph{Clone}
We can still make it work by \emph{cloning} \lstinline{ptr} and supply the clone as an argument. \lstinline{ptr.clone()} creates a new \lstinline{Box}, allocates new memory, and initializes it with the same value as \lstinline{ptr}’s:
\begin{lstlisting}[language=Rust, caption={Usage of \lstinline{Clone}}]
fn print_value(ptr: Box<i32>) {
println!("{}", *ptr);
}
fn main() {
let ptr = Box::new(42);
// v
print_value(ptr.clone());
*ptr += 1;
\end{lstlisting}
Now everything works but keep in mind that the clone is completely independent from the original value. If we changed the value of the clone in \lstinline{print_value}, it would not be visible to the outside.
\paragraph{Copy}
We previously noted that ownership, and therefore move semantics, apply to every value in Rust. But if we change the code to use an \lstinline{i32} directly instead of putting it on the heap, we don’t need to clone anything:
\begin{lstlisting}[language=Rust, caption={Number types are \lstinline{Copy}}]
fn print_value(number: i32) {
println!("{}", number);
}
print_value(number);
number += 1;
This compiles and works. That is not because \lstinline{i32} breaks move semantics but because the type implements the \lstinline{Copy} trait. Normally, if a value is moved, the physical bits making up that value are moved to a new location, e. g. the new stack frame, and are not available at the previous position anymore. If a type implements \lstinline{Copy}, the bit pattern is instead copied over and retained. Because \lstinline{i32} doesn’t allocate any heap or handles any resources, such a copy is valid. A \lstinline{Box} is not \lstinline{Copy}, because then two owners for the same resource would exist which would violate drop responsibility. If we want to duplicate a \lstinline{Box}, we need to allocate new memory on the heap and initialize it properly, which is what \lstinline{clone} does.
\paragraph{Manually Drop} Sometimes we don’t want \ac{RAII} to happen, we want to free resources ourselves. If that is the case, we can wrap a value in a \lstinline{ManuallyDrop} which tells the compiler to not drop it for us. More information can be found in the documentation for the type.\footnote{\url{https://doc.rust-lang.org/stable/std/mem/struct.ManuallyDrop.html}}
\subsection{Borrowing}
\begin{enumerate}
\item unguarded aliasing + mutability = race conditions
\item The borrow checker prevents this
\item two kinds of references
\item returning a borrow to the owner (lexical lifetimes)
\item moving out of a borrowed value is illegal (e. g. \lstinline{let y = &x; let z = *y;})
\item lifetime polymorphism and lifetime elision
\end{enumerate}
\subsection{Non-Lexical Lifetimes}
\begin{enumerate}
\item Example: Correct but rejected program
\item Solution: Early drop of reference/non-lexical lifetimes
\end{enumerate}
\subsection{Reborrows, the Borrow Stack and Two-Phase Borrowing}
\begin{enumerate}
\item What is a reborrow?
\item Example: Reborrows are correct but rejected
\item Solution: Allow for Reborrows
\item Using shared references after creating a unique one is correct but rejected
\item Example
\item Solution: Two-Phase Borrowing
\item Note: TPB is not yet implemented
\todo[inline]{Fit in the Borrow Stack}
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
\end{enumerate}
\subsection{Loans and Regions}
\begin{enumerate}
\item What are loans and regions?
\item regions as an alternative to lifetimes
\item regions are more powerful
\item Piggyback on the talk by \href{https://www.youtube.com/watch?v=_agDeiWek8w}{Niko Matsakis}
\item Note: Polonius will be a new Borrowchecker using regions but it is unstable
\item Note: concrete memory regions à la Cyclone vs abstract regions à la Polonius
\end{enumerate}
\section{Contracts and Refinement Types}
\subsection{Contracts}
\begin{enumerate}
\item Preconditions, Postconditions, Invariants
\item Hoare Logic
\item Contracts are a Runtime Check
\item Are there first-class contracts?
\item Higher-order contracts (\href{https://www2.ccs.neu.edu/racket/pubs/icfp2002-ff.pdf}{Findler, Felleisen})
\end{enumerate}
\subsection{Refinement Types}
\begin{enumerate}
\item Independent Refinement Types
\item Dependent Refinement Types/Refined Function Types
\item Refinement Types are “first-class”
\end{enumerate}
\subsection{Liquid Types SMT Solving}
\begin{enumerate}
\item Refinement Types are not decidable
\item SAT is decidable
\item SMT is decidable
\item Liquid Types = $\{ T : \text{RefinementType } | \text{ decidable } T \}$
\end{enumerate}
\subsection{Hybrid Type Checking}
\begin{enumerate}
\item Idea: Combine Contracts and Liquid Types
\item Sprinkle in casts everywhere an assertion is made
\item If the compiler can prove that this is ok or not ok, remove the cast (and potentially throw a compiler error)
\item All other casts are dynamic and lead to a panic if they fail.
\item In Rust, \lstinline{Result} and \lstinline{?} could be used instead. Problem: The error must be handled somewhere.
\end{enumerate}
\section{Other Extensions to a Type System}
\subsection{Totality and Termination Metrics}
\begin{enumerate}
\item Some functions can panic or loop indefinitely
\item In Rust, \lstinline{!} as return type helps (such a function does definitely not terminate)
\item There is no “may not terminate”, though, so you have to rely on the documentation
\item You can prove that a loop/recursion finishes by termination metrics
\item Panics can technically occur in every function (every function call needs memory space for a stack frame)
\item But you can prove that there are no “high-level” panics in any execution path within a function
\end{enumerate}
\subsection{(Other) Substructural Types}
\begin{enumerate}
\item Move semantics are affine types
\item Clone and Copy is weakening
\item \lstinline{#[must_use]} and relevant types
\item Linear types
\item Ordered types
\item Uniqueness types
\end{enumerate}
\subsection{Purity and (algebraic) Effects}
\begin{enumerate}
\item Effects
\item Handling (Example: Exceptions)
\item Algebraic Effects
\item Pure functions don’t have unhandled effects
\item What counts as an effect (IO, allocation, computation)
\item non-totality is an effect, but do termination metrics fit in?
\item Weakening and Contraction are effects, but what about borrowing?
\item Refinements $\cap$ Effects