\title{Formal Specification of the Extended UTXO Model (\red{\textsf{DRAFT}})}

\date{12th September 2019}
\documentclass[a4paper]{article}

\usepackage[disable]{todonotes}
+
%\usepackage[disable]{todonotes}
\usepackage{hyperref} % This has to go at the end of the packages.
\newcommand{\validator}{\mi{validator}}
\newcommand{\redeemer}{\mi{redeemer}}

\newcommand{\datascript}{\mi{datascript}}
+
\newcommand{\valdata}{\mi{valdata}}
+
\newcommand{\Data}{\ensuremath{\mathsf{Data}}}
\newcommand{\outputref}{\mi{outputRef}}
\newcommand{\txin}{\mi{in}}
\newcommand{\txout}{\mi{out}}
\newcommand\B{\ensuremath{\mathbb{B}}}
\newcommand\N{\ensuremath{\mathbb{N}}}
\newcommand\Z{\ensuremath{\mathbb{Z}}}
+
\renewcommand\H{\ensuremath{\mathbb{H}}}
+
%% \H is usually the Hungarian double acute accent
\newcommand{\emptymap}{\ensuremath{\{\}}}

\todokwxm{Michael suggests (and I agree) that we should try to make

clear the difference between the ledger layer and the script layer.

It would be good if we could make the EUTXO model independent of the

details of the scripting language, just saying ``We don't care what

the scripting language actually is: as long as you can do

\textsf{A},\textsf{B},\textsf{C}, and \textsf{D}, then we can make

the EUTXO model work'' (where

\textsf{A},\textsf{B},\textsf{C},\textsf{D} are things like encoding

the \textsf{PendingTx} data, performing validation, verifying the

content of the output data scripts, ...). Then we can say separately

``Here's how you can do \textsf{A},\textsf{B},\textsf{C}, and

\textsf{D} in Plutus Core''. This would make things modular, giving

us a clear and welldefined interface between the EUTXO model and

Plutus Core, which would surely be a good thing. However, the business

about validating data scripts \textit{really} messes this up.}
\section{Introduction: The Extended UTXO Model}
multiple transactions, and our aim is to define a transaction model
which enables the implementation of highly expressive contracts.

An important component of our UTXO models are \textit{scripts},
+
An important feature of our UTXO models is \textit{scripts},
programs which run on the blockchain to check the validity of
transactions. In Cardano, scripts will be programs in the Plutus Core
language~\citep{PlutusCorespec}. The Extended UTXO models are largely
\subsection{Structure of the document}
\label{sec:docstructure}

This document proposes two extensions of the basic UTXO model (EUTXO

stands for \textit{Extended UTXO}):
+
The papers~\citep{Zahnentferner18Chimeric} and
+
\citep{Zahnentferner18UTxO} give a formal specification of a basic
+
UTXO model. See Note~\ref{note:basicutxo} for some background on
+
\noindent This document proposes two extensions of the basic UTXO
+
model (EUTXO stands for \textit{Extended UTXO}):
\item \textbf{EUTXO1} (Section~\ref{sec:eutxo1}): this extends the
\textit{custom currencies} and \textit{nonfungible tokens}.

\noindent We give formal specifications of each of these versions,

based on the models appearing in the

papers~\citep{Zahnentferner18Chimeric} and

\citep{Zahnentferner18UTxO}.
book~\citep{Plutusbook}.
This section defines some basic notation. We generally follow the
notation established by \citep{Zahnentferner18UTxO}.
+
\subsection{Basic types}
+
\noindent Throughout the document we assume a number of basic types.
+
These are shown in Figure~\ref{fig:basictypes}.
+
\B&: the type of booleans, $\{\false, \true\}$\\
+
\N&: the type of natural numbers, $\{0, 1, 2, \ldots\}$\\
+
\Z&: the type of integers, $\{,\ldots, 2, 1, 0, 1, 2, \ldots\}$\\
+
\H&: the type of \textit{bytestrings}, $\bigcup_{n=0}^{\infty}\{0,1\}^{8n}$\\
+
$\qty = \N$&: an amount of currency, always nonnegative\\
+
$\qtypm = \Z$&: a possibly negative amount of currency\\
+
$\s{SlotNumber} = \N$&: a slot number\\
+
$\s{Address} = \N$&: the address of an object in the blockchain\\
+
$\s{TxId} = \N$&: the identifier of a previous transaction on the chain\\
+
\label{fig:basictypes}
+
\noindent We regard $\N$ as a subtype of $\Z$ and convert freely between
+
natural numbers and nonnegative integers. A bytestring is a sequence
+
\noindent In practice an \textsf{Address} will be a
+
collisionresistant hash of some object (so the address of an object
+
$X$ is $X^{\#}$ in the notation of Figure~\ref{fig:basicnotation}),
+
and the blockchain will provide an efficient way to retrieve the
+
original object given its hash. Compiled Plutus Core scripts are
+
stored and transmitted in a serialised form, and when we talk about
+
the hash of a Plutus Core script we mean the hash of the serialised
+
% Accessing objects indirectly via addresses is helpful because it can
+
% help to reduce memory and disk usage: for example, there may be
+
% scripts for common validation scenarios used in many transactions, and
+
% it is more efficient to store single copies of such scripts rather
+
% than having hundreds of transactions each with their own copy.
+
\noindent We assume that every transaction in a ledger has a unique identifier
+
of type \s{TxId}: in an implementation this would probably be the
+
hash of the transaction.
+
\subsection{The \Data{} type}
+
We also define a type \Data{} which can be used to pass information
+
into scripts in a typesafe manner: see Figure~\ref{fig:datadefn}. The
+
definition is given here in EBNF form, but can easily be translated to
+
a Haskell type, for instance.
+
 "Constr" \(\N (\List{\Data})\)
+
 "Map" \(\List{\Data\times\Data}\)
+
\caption{The \Data{} type}
+
\noindent Thus values of type \Data{} are nested sums and products built up
+
recursively from the base types of integers and bytestrings. This
+
allows one to encode a large variety of firstorder data structures:
+
for example, we could encode the list [11,22,33] as \texttt{Constr 1
+
(I 11, Constr 1 (I 22, Constr 1 (I 33, Constr 0)))}, using
+
\texttt{Constr 0} to represent the \texttt{Nil} constructor of a list
+
type, and \texttt{Constr 1} to represent the \texttt{Cons}
+
The \texttt{Map} constructor is strictly redundant, but is included for
+
convenience to allow a straightforward encoding of records.
+
We assume that the scripting language has the ability to parse values
+
of type \Data{}, converting them into a suitable internal representation.
+
\subsection{Further notation}
+
Figure~\ref{fig:basicnotation} presents some further notation for use in the
+
\todokwxm{Split this over the page break once things have stabilised.}
\item Types are typeset in $\mathsf{sans~serif}$.
\item The only operation on $\mathsf{Script}$ from the ledger's