morekeywords={as,async,await,break,const,continue,crate,dyn,else,enum,extern,false,fn,for,if,impl,in,let,loop,match,mod,move,mut,pub,ref,return,Self,self,static,struct,super,trait,true,type,union,unsafe,use,where,while},% Current keywords
morekeywords={as,async,await,break,const,continue,crate,dyn,else,enum,extern,false,fn,for,if,impl,in,let,loop,match,mod,move,mut,pub,ref,return,Self,self,static,struct,super,trait,true,type,union,unsafe,use,where,while},% Current keywords
@@ -29,7 +29,7 @@ Even in languages without manual memory management, the programmer still must ma
file.close()
\end{lstlisting}
\paragraph{Resource Handling Strategies}
\subsubsection{Resource Handling Strategies}
This manual resource management comes with some difficulties. The programmer has to watch out to always free the resources after using them, because otherwise the program could leak memory or unneccessarily inhibit other processes from accessing the resources. On the other hand, one may also not free the resources too early, which would lead to a use after free, and also not free resources multiple times. This can be very tricky in complex programs where resources are shared between threads or over \acs{API} boundaries.
...
...
@@ -59,7 +59,7 @@ Other languages have different strategies. Go supports the \lstinline{defer} sta
These prevent double frees and use after free errors, but they can still be forgotten. A programmer who wants to use a resource must be aware that it has to be cleaned up
in the end and changes in one line of the code have to be mirrored in the other. It is not as bad as in manual resource management because the programmer has to change only one additional line and that line is usually close.
\paragraph{\acs{RAII} and Drop Responsibilities}
\subsubsection{\acs{RAII} and Drop Responsibilities}
A third variant is \acfi{RAII}, also called \acfi{SBRM}, used by C++ and Rust. This ensures that the destructor of an object is run when it goes out of scope, which will clean up automatically. An example is the \lstinline{std::unique_ptr} in C++. The pointer \emph{owns} the memory it points to, meaning when it is created, it automatically allocates memory on the heap and when it goes out of scope, it frees the memory.
...
...
@@ -86,7 +86,7 @@ A third variant is \acfi{RAII}, also called \acfi{SBRM}, used by C++ and Rust. T
As soon as a variable falls out of scope, it is \emph{dropped}, meaning all
associated resources are freed. It is said that a variable has a \emph{drop responsibility}. A programmer can define custom drop behavior for their own resources by implementing the \lstinline{Drop} trait for their type, but normally the compiler does it for us.
\paragraph{Move Semantics}
\subsubsection{Move Semantics}
\ac{RAII} comes with a few disadvantages, though. Since a value is dropped when its owner goes out of scope, ther must always be exactly one owner for every value. This means that the compiler will transfer ownership sometimes, like in the following example.
...
...
@@ -112,7 +112,7 @@ error[E0382]: use of moved value: `*ptr`
Because we supplied \lstinline{ptr} as an argument to \lstinline{print_value}, it took ownership of that value, and now has the drop responsibility. As soon as the call to \lstinline{print_value} is finished, the memory is freed and accessing it afterwards is a use after free. We say that the value was \emph{moved} out of \lstinline{ptr} and into \lstinline{print_value}.
\paragraph{Clone}
\subsubsection{Clone}
We can still make it work by \emph{cloning}\lstinline{ptr} and supply the clone as an argument. \lstinline{ptr.clone()} creates a new \lstinline{Box}, allocates new memory, and initializes it with the same value as \lstinline{ptr}’s:
...
...
@@ -132,7 +132,7 @@ fn main() {
Now everything works but keep in mind that the clone is completely independent from the original value. If we changed the value of the clone in \lstinline{print_value}, it would not be visible to the outside.
\paragraph{Copy}
\subsubsection{Copy}
We previously noted that ownership, and therefore move semantics, apply to every value in Rust. But if we change the code to use an \lstinline{i32} directly instead of putting it on the heap, we don’t need to clone anything:
...
...
@@ -152,7 +152,9 @@ fn main() {
This compiles and works. That is not because \lstinline{i32} breaks move semantics but because the type implements the \lstinline{Copy} trait. Normally, if a value is moved, the physical bits making up that value are moved to a new location, e. g. the new stack frame, and are not available at the previous position anymore. If a type implements \lstinline{Copy}, the bit pattern is instead copied over and retained. Because \lstinline{i32} doesn’t allocate any heap or handles any resources, such a copy is valid. A \lstinline{Box} is not \lstinline{Copy}, because then two owners for the same resource would exist which would violate drop responsibility. If we want to duplicate a \lstinline{Box}, we need to allocate new memory on the heap and initialize it properly, which is what \lstinline{clone} does.
\paragraph{ManuallyDrop} Sometimes we don’t want \ac{RAII} to happen, we want to free resources ourselves. If that is the case, we can wrap a value in a \lstinline{ManuallyDrop} which tells the compiler to not drop it for us. This is a general trend in Rust: The correct way should be the easiest to do, and all potentially unsafe constructs are opt-in. More information on \lstinline{ManuallyDrop} can be found in the documentation for the type.\footnote{\url{https://doc.rust-lang.org/stable/std/mem/struct.ManuallyDrop.html}}
\subsubsection{ManuallyDrop}
Sometimes we don’t want \ac{RAII} to happen, we want to free resources ourselves. If that is the case, we can wrap a value in a \lstinline{ManuallyDrop} which tells the compiler to not drop it for us. This is a general trend in Rust: The correct way should be the easiest to do, and all potentially unsafe constructs are opt-in. More information on \lstinline{ManuallyDrop} can be found in the documentation for the type.\footnote{\url{https://doc.rust-lang.org/stable/std/mem/struct.ManuallyDrop.html}}
\subsection{Borrowing}
...
...
@@ -161,7 +163,7 @@ As we have seen, ownership and move semantics ensure memory safety. But they are
That is where references and borrowing come in. Instead of taking ownership of a value, a function can only borrow it through a reference. Then the drop responsibility stays with the caller. References, of course, can not be used for everything, but for our case it is sufficcient. We mark the argument to \lstinline{print_value} as a reference using \lstinline{&}, and creating a reference from a value works the same.
\begin{lstlisting}[language=Rust, caption={Borrow through references to prevent a move}, label={lst:borrow}]
\begin{lstlisting}[language=Rust, caption={Borrow using references to prevent a move}, label={lst:borrow}]
fn print_value(ptr: &Box<i32>) {
println!("{}", *ptr);
}
...
...
@@ -184,7 +186,7 @@ fn main() {
The part of the compiler that enforces these rules is the \emph{borrow checker}. The reasoning behind this is again resource safety, namely preventing \emph{unguarded mutable aliasing}. Having multiple readers of the same data doesn’t cause issues, as long as the data cannot be mutated. Mutating data is fine, as long as no one else can read and/or mutate the data at the same time. Using these two kinds of references enforces, at compile time, that we will always stay on the happy path. But there are some programs that are correct even though they violate these rules. We will be concerned with extending the borrow checker to accept more correct programs in later sections.
\paragraph{The Owner of a Borrowed Value}
\subsubsection{The Owner of a Borrowed Value}
References have to agree to these rules, but the owner has to as well. While shared references exist, the owner may not mutate, e. g. \lstinline{let mut x = 42; let xref = &x; x += 1} is forbidden. Similarly, the owner can’t access a value at all as long as there is a unique reference around.
...
...
@@ -201,11 +203,11 @@ References have to agree to these rules, but the owner has to as well. While sha
}
\paragraph{Lexical Lifetimes}
\subsubsection{Lexical Lifetimes and Lifetime Analysis}
Before we can discuss extensions to the borrow checker, let’s discuss naïve references. References are values like every other and therefore have an owner themselves. That means if we create a reference to a value, the owner can only access it, after the reference goes out of scope.
Before we can discuss extensions to the borrow checker, let’s discuss naïve references. References are values like every other and therefore have an owner themselves. That means if we create a reference to a value, the owner can only access it after the reference has gone out of scope again.
The program in \autoref{lst:lexical-lifetimes} is rejected by our naïve borrow checker since it registers a mutation to \lstinline{x} while a reference to it still exists. Only after the block ends, the borrow is returned to the owner and it can use it freely again. Similarly, we would not be able to create a unique reference while a shared one exists and vice versa. But we can see that the program is correct since \lstinline{xref} is never accessed and therefore no aliasing happens. This code could be saved by wrapping the line in an extra pair of braces but we would like to avoid it. This can be achieved with \acfi{NLL}, which are part of Rust since version 1.31.0.
The program in \autoref{lst:lexical-lifetimes} is rejected by our naïve borrow checker since it registers a mutation to \lstinline{x} while a reference to it still exists. Only after the block ends, the borrow is returned to the owner and it can used again. Similarly, we would not be able to create a unique reference while a shared one exists and vice versa.
\todo{Example for Scoped Lifetimes using regions (?)}
But we can see that the program is correct since \lstinline{xref} is never accessed and therefore no aliasing happens. This code could be saved by wrapping the line in an extra pair of braces but we would like to avoid it. This can be achieved with \acfi{NLL}, which are part of Rust since version 1.31.0.
\paragraph{Access Guards}\todo{Put this in an info box?}
\subsubsection{Access Guards}\todo{Put this in an info box?}
Sometimes we need access to a resource from multiple places at the same time, for example when sharing data between threads. For this, Rust provides the \lstinline{Mutex} container type. References to a mutex can be shared freely, but to change the value in the container, one has to acquire a lock, therefore making the access \emph{guarded}.\todo{Add example} While the mutex is locked, no other thread can access the data at all, a mutex lock is therefore similar to a \lstinline[language=Rust]{&mut} but its guarantees are enforced at runtime. The \lstinline{RwLock} type can give out both read and write locks which have behavior analogous to \lstinline[language=Rust]{&} and \lstinline[language=Rust]{&mut}, respectively. There are more constructs for similar use cases in the standard library, like \lstinline{Arc} and \lstinline{Cow}.
These data structures are implemented using so-called \emph{raw pointers}. Raw pointers, like pointers in C, are not borrow checked and can therefore alias. The programmer has to check manually that no rules are violated and should provide a safe interface that hides the raw pointers from the users. \todo{Pointers to Strict Provenance, Miri, …?}
\paragraph{Returning references and Borrow-through}
\subsubsection{Returning references and Borrow-through}
Functions can receive
Functions can receive references as arguments, but they can also return references. One has to be a bit careful when doing this, though, since all resources created in the scope of a function are freed as soon as the function returns. Consider the following:
\begin{lstlisting}[language=Rust, caption={Try to return a reference}]
fn to_ref(number: i32) -> &i32 {
&number
}
\end{lstlisting}
This fails because \lstinline[language=Rust]{number} is owned by \lstinline[language=Rust]{to_ref} and is dropped as soon as the function returns. The reference would already be invalid when the caller gets access to it. But if the function takes a reference as an argument, it can pass another reference back to the caller. This is called a \emph{borrow-through} and looks like the following:
\todo[inline]{Add paragraph on \lstinline{Borrow}, \lstinline{AsRef}, …?}
\subsubsection{\lstinline[language=Rust]{Deref}, \lstinline[language=Rust]{AsRef}, and \lstinline[language=Rust]{Borrow}}
Sometimes, we have a reference to one type but need a reference to another, similar type. For example, a vector of some type \lstinline[language=Rust]{Vec<T>} is conceptually the same as a slice of that same type \lstinline[language=Rust]{[T]}, except that a \lstinline[language=Rust]{Vec} can grow and shrink and a slice can not. This means that all functions which operate on slices should also work with vectors, and in fact there is a library method \lstinline[language=Rust]{Vec::as_slice} that takes a reference to a vector and provides a reference to a slice. Similarly there is \lstinline[language=Rust]{String::as_str} which transforms a \lstinline[language=Rust]{&String} into a \lstinline[language=Rust]{&str}.
Generally, there are many types that can act as a substitute for another. The common interface for this behavior is the \lstinline[language=Rust]{AsRef} trait. There can be many implementations of this trait for a type. for example, \lstinline[language=Rust]{String} can stand in for \lstinline[language=Rust]{str}, \lstinline[language=Rust]{[u8]}, \lstinline[language=Rust]{OsStr} and \lstinline[language=Rust]{Path}. Everytime a reference to one of these types are needed, we can use a \lstinline[language=Rust]{String} instead, if we first call \lstinline[language=Rust]{as_ref} on it:
\begin{lstlisting}[language=Rust, caption={Use \lstinline{String} in place of \lstinline{[u8]}}]
fn needs_bytes(x: &[u8]) { /*...*/ }
// ...
let s = String::from("Hello Bytes");
needs_bytes(s.as_ref());
\end{lstlisting}
\todo[inline]{\lstinline[language=Rust]{Deref}, \lstinline[language=Rust]{Borrow}, and \lstinline[language=Rust]{mut} versions}
\subsection{Non-Lexical Lifetimes}
\begin{enumerate}
...
...
@@ -322,3 +358,5 @@ Functions can receive
\item Weakening and Contraction are effects, but what about borrowing?