Skip to content
Snippets Groups Projects
Commit 193e0e17 authored by Jasper Clemens Gräflich's avatar Jasper Clemens Gräflich
Browse files

Add to 1.1.6 Borrow Errors and Classic NLL

Also change unique->exclusive
parent 509ce510
No related branches found
No related tags found
No related merge requests found
Pipeline #86568 passed
......@@ -177,18 +177,18 @@ fn main() {
}
\end{lstlisting}
\autoref{lst:borrow} compiles without an error and does what we would expect. There are two kinds of references. We just looked at \emph{shared} or \emph{immutable references}. The other kind is \emph{unique} or \emph{mutable references}, and they are denoted with \inline{&mut}. The different kinds of references have a different semantics attached to them. While both are used to access values without taking ownership, there are specific rules for the creation and guarantees associated with each:
\autoref{lst:borrow} compiles without an error and does what we would expect. There are two kinds of references. We just looked at \emph{shared} or \emph{immutable references}. The other kind is \emph{exclusive} or \emph{mutable references}, and they are denoted with \inline{&mut}. The different kinds of references have a different semantics attached to them. While both are used to access values without taking ownership, there are specific rules for the creation and guarantees associated with each:
\begin{itemize}
\item A shared reference to a value can always be created, as long as there is no unique reference to the same value. It is not possible to mutate the pointee through a shared reference (which is why it is also called immutable).
\item A unique reference can only be created if there is no other reference to the value at all and the referenced value is declared mutable. A unique reference can do everything the owner can, except dropping.
\item A shared reference to a value can always be created, as long as there is no exclusive reference to the same value. It is not possible to mutate the pointee through a shared reference (which is why it is also called immutable).
\item An exclusive reference can only be created if there is no other reference to the value at all and the referenced value is declared mutable. An exclusive reference can do everything the owner can, except dropping.
\end{itemize}
The part of the compiler that enforces these rules is the \emph{borrow checker}. The reasoning behind this is again resource safety, namely preventing \emph{unguarded mutable aliasing}. Having multiple readers of the same data doesn’t cause issues, as long as the data cannot be mutated. Mutating data is fine, as long as no one else can read and/or mutate the data at the same time. Using these two kinds of references enforces, at compile time, that we will always stay on the happy path. But there are some programs that are correct even though they violate these rules. We will be concerned with extending the borrow checker to accept more correct programs in later sections.
\subsubsection{The Owner of a Borrowed Value}
References have to agree to these rules, but the owner has to as well. While shared references exist, the owner may not mutate, e. g. \inline{let mut x = 42; let xref = &x; x += 1} is forbidden. Similarly, the owner can’t access a value at all as long as there is a unique reference around.
References have to agree to these rules, but the owner has to as well. While shared references exist, the owner may not mutate, e. g. \inline{let mut x = 42; let xref = &x; x += 1} is forbidden. Similarly, the owner can’t access a value at all as long as there is an exclusive reference around.
\todo[inline]{
\textbf{This doesn’t not work, I have to think about this some more. I want something to throw Error E0507.}
......@@ -217,7 +217,7 @@ let mut x = 0;
x += 1; // OK, no references exist anymore
\end{lstlisting}
The program in \autoref{lst:lexical-lifetimes} is rejected by our borrow checker since it registers a mutation to \inline{x} while a reference to it still exists. Only after the block ends, the borrow is returned to the owner and it can be used again. Similarly, we would not be able to create a unique reference while a shared one exists and vice versa.
The program in \autoref{lst:lexical-lifetimes} is rejected by our borrow checker since it registers a mutation to \inline{x} while a reference to it still exists. Only after the block ends, the borrow is returned to the owner and it can be used again. Similarly, we would not be able to create an exclusive reference while a shared one exists and vice versa.
But we can see that the program is correct since \inline{xref} is never accessed after \inline{x} is changed and we can save the code by introducing an additional scope.
......@@ -339,7 +339,7 @@ Sometimes we want to express an even stronger connection between two types. For
\subsubsection{Mutable reference conversions}
All three of the aforementioned traits work for shared references only, but their variants \inline{AsMut}, \inline{DerefMut} and \inline{BorrowMut} all take and provide a unique reference.
All three of the aforementioned traits work for shared references only, but their variants \inline{AsMut}, \inline{DerefMut} and \inline{BorrowMut} all take and provide an exclusive reference.
\subsection{Non-Lexical Lifetimes}\label{subsec:non-lexical-lifetimes}
......@@ -354,7 +354,7 @@ x += 1;
Then the (simplified) \ac{CFG} looks something like \autoref{fig:cfg-non-lexical-lifetimes}. It is quite boring because the control flow is linear. The nodes with thick borders are where \inline{xref} is live. The first node is the one in which the reference is created and the second one is where it is last used. Because \inline{x} is not modified in that section of the \ac{CFG}, the borrow checker doesn’t complain.
\begin{figure}[!h]
\begin{figure}[!ht]
\centering
\begin{tikzpicture}
\begin{scope}[every node/.style={draw, rectangle}]
......@@ -387,7 +387,7 @@ x = 42;
Now the \ac{CFG} (\autoref{fig:cfg-non-lexical-lifetimes-branch}) splits up to accommodate both possible paths the program could take during execution. At the \inline{if}, the graph splits into two, but in the last node, both paths join up again. Here we can see that in the \inline{false} branch no problem occurs, but in the \inline{true} branch we try to modify \inline{x} even though \inline{xref} is used later in the same path. Modifying \inline{x} on the last line does not pose any problems since xref is not live anymore on any path that leads to this point.
\begin{figure}[!h]
\begin{figure}[!ht]
\centering
\begin{tikzpicture}
\begin{scope}[every node/.style={draw, rectangle}]
......@@ -413,9 +413,9 @@ Now the \ac{CFG} (\autoref{fig:cfg-non-lexical-lifetimes-branch}) splits up to a
\subsection{Reborrows and Two-Phase Borrowing}
\ac{NLL} give us a lot more freedom when using references, but there are still programs that are clearly correct but don’t pass the borrow checker, especially if unique references are in play. It is not possible to create a new reference of some provenance as long as a unique reference with the same provenance is live. This means the code in \autoref{lst:conflicting-borrows} will not compile.
\ac{NLL} give us a lot more freedom when using references, but there are still programs that are clearly correct but don’t pass the borrow checker, especially if exclusive references are in play. It is not possible to create a new reference of some provenance as long as an exclusive reference with the same provenance is live. This means the code in \autoref{lst:conflicting-borrows} will not compile.
\begin{lstlisting}[language=Rust, caption={Cannot create a shared reference while a unique one exists}, label={lst:conflicting-borrows}]
\begin{lstlisting}[language=Rust, caption={Cannot create a shared reference while an exclusive one exists}, label={lst:conflicting-borrows}]
let mut x = 1;
let ref1 = &mut x;
let ref2 = &x; // Error: Cannot borrow `x`
......@@ -423,7 +423,7 @@ println!("{ref2}");
*ref1 = 2;
\end{lstlisting}
But since \inline{xs} is not live anymore when \inline{xu} is used, aliasing is never occurring, and if we hadn’t used a unique reference but modified the value directly, Rust would have been able to see this. Not being able to create references in this manner is a big thing. Consider the code in \autoref{lst:moved-borrow}. Here, we create a reference to an empty \inline{Vec} and then use it to \inline{push} a value. This moves the reference into the \inline{push} method so that it is dropped when the method returns. It is no longer valid to use it again in the next line since it has already been dropped. But in fact this code compiles because Rust implicitly inserts a \emph{reborrow} for us.
But since \inline{xs} is not live anymore when \inline{xu} is used, aliasing is never occurring, and if we hadn’t used an exclusive reference but modified the value directly, Rust would have been able to see this. Not being able to create references in this manner is a big thing. Consider the code in \autoref{lst:moved-borrow}. Here, we create a reference to an empty \inline{Vec} and then use it to \inline{push} a value. This moves the reference into the \inline{push} method so that it is dropped when the method returns. It is no longer valid to use it again in the next line since it has already been dropped. But in fact this code compiles because Rust implicitly inserts a \emph{reborrow} for us.
\begin{lstlisting}[language=Rust, caption={Move a borrow into a function}, label={lst:moved-borrow}]
let mut v = Vec::new();
......@@ -436,7 +436,7 @@ vref.last(); // Error: Use of moved value `vref`
Let’s get back to the example in \autoref{lst:conflicting-borrows} for now. The compiler complains because we try to create two conflicting references with the same provenance. But we can tell the compiler to temporarily deactivate a reference by borrowing \emph{through this reference}. This is done in \autoref{lst:simple-reborrow}.
\begin{lstlisting}[language=Rust, caption={Reborrow through a unique reference}, label={lst:simple-reborrow}]
\begin{lstlisting}[language=Rust, caption={Reborrow through an exclusive reference}, label={lst:simple-reborrow}]
let mut x = 1;
let ref1 = &mut x;
let ref2 = & *ref1; // Reborrow
......@@ -457,7 +457,7 @@ let mut v = Vec::new();
v.push(v.len());
\end{lstlisting}
The problem lies in the second line. The first argument to \inline{push} is an \inline{&mut self}, and so the compiler implicitly borrows from \inline{v}. But then another (shared) reference is created for the call to \inline{len}. This is not allowed, and a reborrow doesn’t help either since the \inline{&mut self} is currently used. On the other hand, it is clear that the call to \inline{len} will definitely return before the unique reference is ever accessed. In fact, we can work around this problem if we save the result of \inline{v.len()} in a temporary variable, as shown in \autoref{lst:two-phase-borrow-workaround}.
The problem lies in the second line. The first argument to \inline{push} is an \inline{&mut self}, and so the compiler implicitly borrows from \inline{v}. But then another (shared) reference is created for the call to \inline{len}. This is not allowed, and a reborrow doesn’t help either since the \inline{&mut self} is currently used. On the other hand, it is clear that the call to \inline{len} will definitely return before the exclusive reference is ever accessed. In fact, we can work around this problem if we save the result of \inline{v.len()} in a temporary variable, as shown in \autoref{lst:two-phase-borrow-workaround}.
\begin{lstlisting}[language=Rust, caption={Workaround for \autoref{lst:two-phase-borrow}}, label={lst:two-phase-borrow-workaround}]
let mut v = Vec::new();
......@@ -465,7 +465,7 @@ let vlen = v.len();
v.push(vlen);
\end{lstlisting}
This pattern—calling a method with a unique reference but also using shared references in the arguments—is very common and the workaround is unwieldy. Therefore \acs{RFC} 2025 \todo{Add reference for RFC 2025} introduces the concept of a \ac{TPB}. In it, the lifetime of a unique reference is split up into two phases. During the \emph{reservation phase}, it is treated like a shared reference, meaning more shared references can be created and reads are possible. The \emph{activation} happens as soon as the reference is first used to perform a mutation. From this point on, it is treated as a full unique reference.
This pattern—calling a method with an exclusive reference but also using shared references in the arguments—is very common and the workaround is unwieldy. Therefore \acs{RFC} 2025 \todo{Add reference for RFC 2025} introduces the concept of a \ac{TPB}. In it, the lifetime of an exclusive reference is split up into two phases. During the \emph{reservation phase}, it is treated like a shared reference, meaning more shared references can be created and reads are possible. The \emph{activation} happens as soon as the reference is first used to perform a mutation. From this point on, it is treated as a full exclusive reference.
Right now, two-phase borrowing works only for method calls where the first argument is \inline{&mut self}, and it is not resolved if generalized \ac{TPB} should even be supported. \todo{Point to Issue \#49434 for generalized TPB.}
......@@ -473,17 +473,68 @@ Right now, two-phase borrowing works only for method calls where the first argum
\subsection{Loans and Regions}
Previously, we looked at the advantages of \acl{NLL} over lexical lifetimes. But lifetimes have a general defect and are supposed to be replaced by an approach using \emph{regions} using the \emph{Polonius borrow checker}, which is currently in an experimental stage. Here, we will explore how \ac{NLL} and regions work under the hood, and what the advantage of using regions is.
Previously, we looked at the advantages of \acl{NLL} over standard lexical ones. But lifetimes have a general defect and are supposed to be replaced by an approach using \emph{regions} employed by the \emph{Polonius borrow checker}, which is currently in an experimental stage. Here, we will explore how \ac{NLL} and regions work under the hood, and what the advantage of using regions is.
% \begin{lstlisting}[language=Rust, caption={Simple \ac{NLL} example}, label={lst:non-lexical-lifetimes}]
% let mut x = 0;
% let xref = &x;
% println!("{xref}");
% x += 1;
% \end{lstlisting}
\subsubsection{Borrow Errors}
A borrow error needs three things: A statement accessing a path, accessing the path violating some loan, and the loan being live at this point.
We need to define some vocabulary first: A \emph{path} is an identifier like \inline{x}, or is built from a path by a field access \inline{x.f}, a dereference \inline{*x}, or an index \inline{x[i]}. Those can be freely combined, so that for example \inline{(*x)[5].f} is a valid path.\footnote{Paths have a rough equivalent in C and C++ lvalues.}
A \emph{loan} is the result of a borrow expression. It consists of the path which is borrowed from and a \emph{mode}, that is shared or exclusive.
What is a loan, what is a path, what is liveness of a loan?
A loan $L$ is \emph{violated} if its path $P$ is accessed in an incompatible way, that is, $P$ is mutated when the $L$ is shared, or $P$ is accessed at all when $L$ is exclusive. Note that an access can also be \emph{indirect} if $P$ is shows up somewhere in the expression. For example, \inline{(*x)[5].f} accesses \inline{(*x)[5]}, \inline{*x}, and \inline{x} indirectly. Note also that a loan to an index ignores the index variable, that is \inline{x[5]} and \inline{x[4]} produce the same loan to \inline{x[_]}. This is because Rust can generally not know at compile time if two index operations alias. It means that it is impossible to have two exclusive references to different parts of a data structure like in \autoref{lst:two-indices}.
\begin{lstlisting}[language=Rust, caption={Indexing twice into a vector is illlegal}, label={lst:two-indices}]
let mut v = vec![1, 2];
two_refs(&mut v[0], &mut v[1]);
\end{lstlisting}
Now we can define when a borrow error should occur. There are three conditions which all have to be met:
\begin{enumerate}
\item A path $P$ is accessed at some node $N$ of the \ac{CFG},
\item accessing $P$ at $N$ would violate some loan $L$, and
\item $L$ is live at $N$.
\end{enumerate}
Different approaches to borrow checking only differ in determining when $L$ is live. For example, with lexical lifetimes a loan is simply live from its creation until the end of the lexical scope. We are now prepared to dive into liveness analysis in \ac{NLL} and Polonius.
\subsubsection{Classic \ac{NLL}}
When creating a reference, it is live at some points. A loan is live if its reference is. Some references outlive others, this leads to a subtyping relationship.
Under \ac{NLL} the liveness of a loan is derived from the lifetime of a reference. As discussed in \autoref{subsec:non-lexical-lifetimes}, a reference is live in a node of the \ac{CFG} if it may be used later. The corresponding loan is live exactly when the reference is live. \todo{lifetime subtyping and inferred lifetimes/constraints} Crucially, if a function returns a reference, it is live for the whole body of the function. \autoref{lst:nll-reject-correct} shows an example. In the \inline{Some} branch, the reference returned by \inline{v.first()} is in turn returned from the function, meaning it must be live at least until the end of the function body. But in the \inline{None} branch, an exclusive reference is needed to push a value to the vector. This should not be a problem since the shared reference from \inline{v.first()} is not used in this branch, and a different reference is returned from the function instead. However, \ac{NLL} can’t accomodate this situation because it \emph{may} be used later, see \autoref{fig:cfg-nll-reject-correct}.
\begin{lstlisting}[language=Rust, caption={\ac{NLL} reject correct program}, label={lst:nll-reject-correct}]
fn first_or_insert(v: &mut Vec<i32>) -> &i32 {
match v.first() {
Some(x) => x,
None => {v.push(1); &v[0]},
}
}
\end{lstlisting}
\begin{figure}[!ht]
\centering
\begin{tikzpicture}
\begin{scope}[every node/.style={draw, rectangle}]
\node {TODO};
% \node (A) at (0,4.5) {\inline{let mut x = 0;}};
% \node[ultra thick] (B) at (0,3) {\inline{let xref = &x;}};
% \node[ultra thick] (C) at (0,1.5) {\inline{println!("\{xref\}");}};
% \node (D) at (0,0) {\inline{x += 1;}};
\end{scope}
% \path [->] (A) edge node[left] {} (B);
% \path [->, ultra thick] (B) edge node[left] {} (C);
% \path [->] (C) edge node[left] {} (D);
\end{tikzpicture}
\caption{Control-flow diagram for \autoref{lst:nll-reject-correct}}
\label{fig:cfg-nll-reject-correct}
\end{figure}
\subsubsection{Polonious}
......@@ -499,54 +550,54 @@ A Rust library emulating Polonius by using a safe abstraction around raw pointer
Cyclone is a C dialect with region based memory. It uses concrete memory regions whereas Polonius uses abstract ones.
\todo[inline]{flesh out subsection with help of the below}
\subsubsection{Structure of the Talk by \href{https://www.youtube.com/watch?v=_agDeiWek8w}{Niko Matsakis}}
\begin{enumerate}
\item Throughout, Niko uses this example:
\begin{lstlisting}[language=Rust]
\begin{lstlisting}[language=Rust]
/* 0 */ let mut x : u32 = 22;
/* 1 */ let y : &'0 u32 = &'1 x;
/* 2 */ x += 1;
/* 3 */ print(y);
\end{lstlisting}
\item What exactly is a borrow error? The following conditions must be met:
\begin{enumerate}
\item statement $N$ accesses path $P$
\item accessing $P$ would violate the terms of some loan $L$. For shared loans: modifying $P$, for mutable loans: accessing $P$.
\item Note that e. g. mutating a field of a loaned struct is still forbidden; this is an \emph{indirect} mutation.
\item The loan must be live at $N$ (it, or some derived reference, might be used later)
\end{enumerate}
\item A \emph{path} is a local variable on the stack, \inline{x}, a field of a path \inline{x.f}, a dereference \inline{*x}, or an index \inline{x[_]} (the index itself is not interesting to us; indexing accesses the whole of \inline{x}).
\item A \emph{loan} is the result of a borrow expression \inline{&x}: a path (here \inline{x}) and a \emph{mode} (shared or unique).
\begin{enumerate}
\item statement $N$ accesses path $P$
\item accessing $P$ would violate the terms of some loan $L$. For shared loans: modifying $P$, for mutable loans: accessing $P$.
\item Note that e. g. mutating a field of a loaned struct is still forbidden; this is an \emph{indirect} mutation.
\item The loan must be live at $N$ (it, or some derived reference, might be used later)
\end{enumerate}
\item Borrow checking before Polonius:
\begin{enumerate}
\item compute lifetimes of references: the part (set of nodes in the control-flow graph; we use lines as a simplification) of the program where the references might be used
\item Every reference has a fresh inference variable (variable in the algebraic sense), and the compiler has to figure out which set they correspond to.
\item For example \inline{y} has the lifetime \inline{'0} associated with it. Since \inline{y} is live on lines 2 and 3, $'0$ must include them. Therefore $'0 = \{2, 3\}$. $'1$ must outlive $'0$, therefore $'1 = \{2, 3\}$. We receive:
\begin{lstlisting}[language=Rust]
/* 0 */ let mut x : u32 = 22;
/* 1 */ let y : &{2, 3} u32 = &{2, 3} x;
/* 2 */ x += 1;
/* 3 */ print(y);
\end{lstlisting}
\item A loan is then simply live during the lifetime of that reference.
\item So, in our example: The statement on line 2 accesses path \inline{x}, which violates the loan $L$ \inline{(x, shared, {2, 3})} from line 1, and $L$ is live at this point. Therefore, we have an error.
\end{enumerate}
\begin{enumerate}
\item compute lifetimes of references: the part (set of nodes in the control-flow graph; we use lines as a simplification) of the program where the references might be used
\item Every reference has a fresh inference variable (variable in the algebraic sense), and the compiler has to figure out which set they correspond to.
\item For example \inline{y} has the lifetime \inline{'0} associated with it. Since \inline{y} is live on lines 2 and 3, $'0$ must include them. Therefore $'0 = \{2, 3\}$. $'1$ must outlive $'0$, therefore $'1 = \{2, 3\}$. We receive:
\begin{lstlisting}[language=Rust]
/* 0 */ let mut x : u32 = 22;
/* 1 */ let y : &{2, 3} u32 = &{2, 3} x;
/* 2 */ x += 1;
/* 3 */ print(y);
\end{lstlisting}
\item A loan is then simply live during the lifetime of that reference.
\item So, in our example: The statement on line 2 accesses path \inline{x}, which violates the loan $L$ \inline{(x, shared, {2, 3})} from line 1, and $L$ is live at this point. Therefore, we have an error.
\end{enumerate}
\item Borrow checking in Polonius:
\begin{enumerate}
\item Instead of tracking where a reference $R$ might be used, we track where it comes from, its \emph{origin}. The origin of $R$ is a set of loans $R$ might have come from (so we go backwards).
\item $'1$ has the origin $\{L1\}$, it is a singleton since it is a freshly created reference. $'0$ also has origin $\{L1\}$, since there is only one assignment to \inline{y}. We have:
\begin{lstlisting}[language=Rust]
\begin{enumerate}
\item Instead of tracking where a reference $R$ might be used, we track where it comes from, its \emph{origin}. The origin of $R$ is a set of loans $R$ might have come from (so we go backwards).
\item $'1$ has the origin $\{L1\}$, it is a singleton since it is a freshly created reference. $'0$ also has origin $\{L1\}$, since there is only one assignment to \inline{y}. We have:
\begin{lstlisting}[language=Rust]
/* 0 */ let mut x : u32 = 22;
/* 1 */ let y : &{L1} u32 = &{L1} x;
/* 2 */ x += 1;
/* 3 */ print(y);
\end{lstlisting}
\item Liveness is not relevant here, only flow of values. Only the liveness of variables are important.
\item Now, a loan $L$ is live if some live variable has $L$ in its type.
\item So, in our example: Line 2 modifies \inline{x}, violating the loan \inline{(x, shared)}. \inline{y} is live at this point and contains the loan \inline{(x, shared)}. Therefore, we have an error.
\end{enumerate}
\item Liveness is not relevant here, only flow of values. Only the liveness of variables are important.
\item Now, a loan $L$ is live if some live variable has $L$ in its type.
\item So, in our example: Line 2 modifies \inline{x}, violating the loan \inline{(x, shared)}. \inline{y} is live at this point and contains the loan \inline{(x, shared)}. Therefore, we have an error.
\end{enumerate}
\item Why does it matter?
\begin{lstlisting}[language=Rust]
\begin{lstlisting}[language=Rust]
fn get_or_insert(
map: &mut HashMap<u32, String>
) -> &String {
......@@ -559,17 +610,17 @@ Cyclone is a C dialect with region based memory. It uses concrete memory regions
}
}
\end{lstlisting}
Here (with classical NLL), \inline{v} lives longer than the function, meaning that the loan is live for the whole function call. In particular, it is live when we try to insert, which is a violation (since we need a unique reference here)
With Polonius, \inline{v} is not live in the \inline{None} branch and therefore no violation happens because there are no live variables with that loan in their type.
Here (with classical NLL), \inline{v} lives longer than the function, meaning that the loan is live for the whole function call. In particular, it is live when we try to insert, which is a violation (since we need an exclusive reference here)
With Polonius, \inline{v} is not live in the \inline{None} branch and therefore no violation happens because there are no live variables with that loan in their type.
\item Polonius can also help with self-referential structs:
\begin{lstlisting}[language=Rust]
\begin{lstlisting}[language=Rust]
struct Message {
buffer : Vec<String>,
slice: &'buffer [u8], // borrow from field `buffer`
}
\end{lstlisting}
When creating a \inline{Message}, Polonius can check that the origin of \inline{'buffer} is within the struct.
When creating a \inline{Message}, Polonius can check that the origin of \inline{'buffer} is within the struct.
\end{enumerate}
\section{Contracts and Refinement Types}
......
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment