Finish 1.1.6 Polonius

9e8daf01 · Jasper Clemens Gräflich · 468bafc4 · 9e8daf01 · 9e8daf01 · 9e8daf01
Commit 9e8daf01 authored 2 years ago by Jasper Clemens Gräflich
--- a/preamble.tex
+++ b/preamble.tex
@@ -82,7 +82,7 @@ morekeywords=[2]{bool,i8,i16,i32,i64,i128,isize,u8,u16,u32,u64,u128,usize,f32,f6
 }
 \lstset{style=color}

-% We use `~' as delimiter for the listing because baces don’t
+% We use `~' as delimiter for the listing because braces don’t
 % behave well if there are also braces in th source code. I hope
 % that `~' is sufficiently rare in Rust to never encounter it.
 \newcommand{{\inline}}[1]{{{\lstinline[language=Rust]~#1~}}}

--- a/thesis/1-introduction.tex
+++ b/thesis/1-introduction.tex
@@ -507,7 +507,7 @@ Different approaches to borrow checking only differ in determining when $L$ is l

 \subsubsection{Classic \ac{NLL}}

-Under \ac{NLL} the liveness of a loan is derived from the lifetime of a reference. As discussed in \autoref{subsec:non-lexical-lifetimes}, a reference is live in a node of the \ac{CFG} if it may be used later. The corresponding loan is then simply live exactly when the reference is live. \todo{lifetime subtyping and inferred lifetimes/constraints, “outlive” relationship, \inline{'a : 'b}} Crucially, if a function returns a reference, it is live for the whole body of the function.
+Under \ac{NLL} the liveness of a loan is derived from the lifetime of a reference. As discussed in \autoref{subsec:non-lexical-lifetimes}, a reference is live in a node of the \ac{CFG} if it may be used later. This means that we walk forward along the \ac{CFG} to determine the liveness of the reference and the corresponding loan is live exactly when the reference is. Crucially, if a function returns a reference, it is live for the whole body of the function. \todo{lifetime subtyping and inferred lifetimes/constraints, “outlive” relationship, \inline{'a : 'b}}

 \begin{lstlisting}[language=Rust, caption={\ac{NLL} reject correct program}, label={lst:nll-reject-correct}]
 fn first_or_insert(v: &mut Vec<i32>) -> &i32 {
@@ -519,20 +519,20 @@ fn first_or_insert(v: &mut Vec<i32>) -> &i32 {
 }
 \end{lstlisting}

-\autoref{lst:nll-reject-correct} shows an example. In the \inline{Some} branch, \inline{x} is returned from the function, which in turn depends on the reference produced by \inline{fst}. Because \inline{x} is returned from the function, it needs to be live at least until the end of the body of \inline{first_or_insert}. But since \inline{x} is derived from \inline{fst}, that reference must outlive \inline{x}, hence being live for the whole function body as well. In the \inline{None} branch, an exclusive reference to \inline{v} is created for the call to \inline{push}. This produces an error because that node lies on a path between the creation of \inline{fst} and the return point.
+\autoref{lst:nll-reject-correct} shows an example. In the \inline{Some} branch, \inline{x} is returned from the function, which in turn depends on the reference produced by \inline{fst}. Because \inline{x} is returned from the function, it needs to be live at least until the end of the body of \inline{first_or_insert}. But since \inline{x} is derived from \inline{fst}, that reference must outlive \inline{x}, hence being live for the whole function body as well. In the \inline{None} branch, an exclusive reference to \inline{v} is created for the call to \inline{push}. This produces an error because that node lies on a path between the creation of \inline{fst} and the return point.\todo{Do the listing again with lifetime annotations?}

-This should not happen. We can see that \inline{fst} is not actually used when we go through the \inline{None} arm because a different reference is returned in that case. However, \ac{NLL} can’t accomodate this situation because \inline{fst} \emph{may} be used later, see \autoref{fig:cfg-nll-reject-correct}.
+This should not happen. We can see that \inline{fst} is not actually used when we go through the \inline{None} arm because a different reference is returned in that case. However, \ac{NLL} can’t accommodate this situation because \inline{fst} \emph{may} be used later, see \autoref{fig:cfg-nll-reject-correct}.

 \begin{figure}[!ht]
    \centering
    \begin{tikzpicture}
        \begin{scope}[every node/.style={draw, rectangle}]
-            \node[ultra thick] (Z) at (0,7.5) {\inline{let fst = v.first();}};
-            \node[ultra thick] (A) at (0,5) {\inline{match fst}};
-            \node[ultra thick] (B) at (-1,2.75) {\inline{x}};
-            \node[ultra thick, draw=red] (C) at (1,3.5) {\inline{v.push(1);}};
-            \node[ultra thick] (D) at (1,2) {\inline{&v[0]}};
-            \node[ultra thick] (E) at (0,0.5) {\inline{return}};
+            \node[ultra thick] (Z) at (0,4) {\inline{let fst = v.first();}};
+            \node[ultra thick] (A) at (0,3) {\inline{match fst}};
+            \node[ultra thick] (B) at (-1,1.5) {\inline{x}};
+            \node[ultra thick, draw=red] (C) at (1,2) {\inline{v.push(1);}};
+            \node[ultra thick] (D) at (1,1) {\inline{&v[0]}};
+            \node[ultra thick] (E) at (0,0) {\inline{return}};
        \end{scope}
        \path [->, ultra thick] (Z) edge node[left] {} (A);
        \path [->, ultra thick] (A) edge node[left] {\inline{Some}} (B);
@@ -547,53 +547,41 @@ This should not happen. We can see that \inline{fst} is not actually used when w

 \subsubsection{Polonious}

-With lifetimes, we looked \emph{forwards} from the borrow expression to see how long a reference (and therefore the loan) are live. Polonius goes \emph{backwards} from a point of use to see if there is still a live reference. Polonius doesn’t use lifetimes but \emph{regions} to determine the liveness of a loan. A region\footnote{Polonius calls regions \emph{origins}, which is a more telling name, but regions is the more standard term.} is a set of loans.
+With lifetimes, we looked \emph{forwards} from the borrow expression to see how long a reference (and therefore the loan) is live. Polonius goes \emph{backwards} from a point of use to see if there is still a live reference. Polonius doesn’t use lifetimes but \emph{regions} to determine the liveness of a loan.

-When creating a reference, it gets an associated \emph{region} (\emph{origin} in Polonius), which is part of its type. (What is a region?) A loan is live if some live variable has it in its type.
+A \emph{region}\footnote{Polonius calls regions \emph{origins}, which is a more telling name, but regions is the more standard term.} is a set of loans. Each reference has a region consisting of all loans it may depend on. A fresh reference created from an owned value has a region consisting of just one loan, but a reference, e. g. if returned by a function, could depend on several inputs and therefore have a region of several loans. Note that in this step we don’t care about how long a reference is valid, we don’t go forward in the \ac{CFG}. Instead, we only consider previous nodes to determine regions.

-\todo[inline]{Write up nicely the stuff below}
+Now, a loan $L$ is live at some node $N$, if there is some variable which is live at $N$ and contains $L$ in its region. This difference means that different paths through the \ac{CFG} are independent from each other, because a node in one path can’t see a node in the other one by walking back the \ac{CFG}.
+
+Let’s look at the example from \autoref{lst:nll-reject-correct} with all regions made explicit. \lstinline~x'{L1, L2}~ denotes that expression \inline{x} has a region consisting of the two loans \inline{L1} and \inline{L2}.
+
+\begin{lstlisting}[language=Rust, caption={Example with region annotations}, label={lst:regions}]
+fn first_or_insert(v'{L0}: &mut Vec<i32>)
+    -> &'{L0, L1, L3} i32
+{
+    let fst'{L0, L1} = v.first()'{L0, L1};
+    match fst {
+        Some(x'{L0, L1}) => x,
+        None => {v.push(1)'{L0, L2}; &v[0]'{L0, L3}},
+    }
+}
+\end{lstlisting}
+
+In \autoref{lst:regions} we can see that there are four relevant loans: \inline{L0} is the loan of the reference we got passed in. All references depend on it. \inline{first} creates a reference with loan \inline{L1} that is returned in the \inline{Some} branch, \inline{push} implicitly reborrows to push a value onto \inline{*v}. The final reference is created by the index operation and it may also be returned. Therefore, the return value has a region \lstinline~'{L0, L1, L2}~, because those three loans are what it may depend on.
+
+Under \ac{NLL}, the \inline{push} was not possible because \inline{x} being live and depending on \inline{fst} meant that \inline{fst} was live. With Polonius, we must check if there is any live variable that has a nonempty intersection with \lstinline~'{L0, L2}~. \inline{fst} and \inline{x} are not live, so they don’t pose a problem, even if the regions overlap. \inline{v} is live and there is a region overlap, but since the compiler inserts a reborrow, it is not a problem. There could still be an error if the borrow stack were invalidated at a later point, but since Polonius is only looking backwards, this is not something we have to consider here. There are no more live variables, so the node passes the borrow check.
+
+\subsubsection{Self-referential Structs}
+
+Sometimes we want to have structures that contain references to parts of itself.\footnote{A real-world example of this are futures which must store the point of execution they are in. Currently they are a special case in the compiler and can’t be expressed in user-code.} Consider the struct in \autoref{lst:self-referential}, in which we store some data and additionally provide a view into a part of the data. The \inline{window} field contains a reference, but it is impossible to assign a lifetime to it. One could, however give it a region. The code shows one proposed syntax for that.
+
+\begin{lstlisting}[language=Rust, caption={Self-referential struct}, label={lst:self-referential}]
+struct View<T> {
+    data: Vec<T>,
+    window: &'data [T]
+}
+\end{lstlisting}

-\begin{enumerate}
-    \item Borrow checking in Polonius:
-          \begin{enumerate}
-              \item Instead of tracking where a reference $R$ might be used, we track where it comes from, its \emph{origin}. The origin of $R$ is a set of loans $R$ might have come from (so we go backwards).
-              \item $'1$ has the origin $\{L1\}$, it is a singleton since it is a freshly created reference. $'0$ also has origin $\{L1\}$, since there is only one assignment to \inline{y}. We have:
-                    \begin{lstlisting}[language=Rust]
-                /* 0 */ let mut x : u32 = 22;
-                /* 1 */ let y : &{L1} u32 = &{L1} x;
-                /* 2 */ x += 1;
-                /* 3 */ print(y);
-                \end{lstlisting}
-              \item Liveness is not relevant here, only flow of values. Only the liveness of variables are important.
-              \item Now, a loan $L$ is live if some live variable has $L$ in its type.
-              \item So, in our example: Line 2 modifies \inline{x}, violating the loan \inline{(x, shared)}. \inline{y} is live at this point and contains the loan \inline{(x, shared)}. Therefore, we have an error.
-          \end{enumerate}
-    \item Why does it matter?
-          \begin{lstlisting}[language=Rust]
-        fn get_or_insert(
-            map: &mut HashMap<u32, String>
-        ) -> &String {
-            match map.get(&22) {
-                Some(v) => v,
-                None => {
-                    map.insert(22, String::from("hi"));
-                    &map[%22]
-                }
-            }
-        }
-        \end{lstlisting}
-          Here (with classical NLL), \inline{v} lives longer than the function, meaning that the loan is live for the whole function call. In particular, it is live when we try to insert, which is a violation (since we need an exclusive reference here)
-          
-          With Polonius, \inline{v} is not live in the \inline{None} branch and therefore no violation happens because there are no live variables with that loan in their type.
-    \item Polonius can also help with self-referential structs:
-          \begin{lstlisting}[language=Rust]
-        struct Message {
-            buffer : Vec<String>,
-            slice: &'buffer [u8], // borrow from field `buffer`
-        }
-        \end{lstlisting}
-          When creating a \inline{Message}, Polonius can check that the origin of \inline{'buffer} is within the struct.
-\end{enumerate}

 \subsubsection{Polonius the Crab}


--- a/thesis/thesis.pdf
+++ b/thesis/thesis.pdf