Table of Contents
Sorry for dropping off for a bit…
Recently in my desire to understand more of the issues surrounding languages for blockchain I read Dan Robinson’s Understanding Simplicity: Implementing a Smart-Contract Language in 30 Lines of Haskell. Linking from that I read Russell O’Connor’s Simplicity paper. That’s when I had my AHA! moment, why possibly the interest in Coq, and what’s already been done with that. I’ll come back to that in a bit.
I think that I now understand more about what Blockstack might need with respect to programming, what it has, what is being worked on, as well as what other languages for blockchains are out there. So I’d like to review the situation and suggest an approach. Maybe other people are thinking along similar lines and, if so, then I may be just merely making more explicit things that are already implicit. But please correct me where I’m wrong or where there are other thoughts, concerns and comments.
TL;DR: I compare Blockstack’s execution with bcoin’s and argue that Blockstack should define a VM, modeled on Bitcoin Script’s VM. I describe a little of the Simplicity language and argue that program proof and verification while desirable, isn’t a panacea for all of the issues Blockstack may encounter.
A programming language API
Let’s start with the ways we might write a program to perform Blockstack’s’ operations. Often there is a public API which gives the names of some functions or methods that developers would call.
Right now in Blockstack, I don’t see API documentation per se, but looking at the code I think those operations would be the methods in the modules
blockstack_client.operations. For example there is
blockstack_client.operations.revoke.make_transaction(). For each op.
make_transaction() method there is a
txop alias created for it. For example,
blockstack_client.operations.tx_register() is the same thing as
An important thing to note is that each of the
make_transaction() methods calls a corresponding
build() method. For example, there is
The build() method constructs and returns from its parameters a hex string in “wire format”. This hex string represents the operation that is to be performed. The wire format is documented. See https://github.com/blockstack/blockstack-core/blob/master/docs/wire-format.md
I like to draw parallels with how things are done in various places in the bitcoin world, and I’ll follow the general principle of doing things as close as possible to bitcoin, except where there is good reason not to.
So let’s take a look at how bcoin does something similar by looking at bcoin’s Intro to Scripting. Like Blockstack, things are changing in that world, so what I describe is subject to change and not to be construed as a polished solution.
Here is how you might start to create a bcoin script:
const output = new Script();
const input = new Script();
input.setString(0, 'hello world'); // add some metadata
const stack = new Stack();
input.execute(stack); // Put the results of running input into "stack"
OP_ADD you can find as a BitCoin Script opcode.
There is a bit of a mismatch already in that in the Blockstack case
make_transaction is much simpler: you don’t need to call additional
executes or any stacks. In the bcoin()'s
output.pushSym('OP_DROP'), the operation isn’t done immediately, but only in the process of performing
tx_register(), the underlying
register.build()in Blockstack, or`output.pushSym(‘OP DROP’) in bcoin.
There is an additional advantage when the reference implementation is written in a compiled language like C++, Rust, Go, or Java, and a library is created to encapsulate the API: there is interoperability with other languages. So
register.build() could be called from not just Python but from other programming languages. This is typically referred to as writing “Bindings” for a library.
Transactions, Calls vs. VM
Having described an API for programming at its simplest and most primitive form, let us get back to the difference between what bcoin is doing and what Blockstack is currently doing: bcoin, like most bitcoin implementations, has defined a VM. Running instructions in the VM is typically the only way that developers can create “smart contracts.”
As in Blockstack, we don’t need a VM interpreting instructions; the calls themselves could suffice. And the effect of a call can take place immediately, in the case of off-chain execution. If this were implemented in bcoin, the above script could be simpler:
const input = new Script(); // Note: setting input must come before output operations
input.setString(0, 'hello world'); // add some metadata
const output = new Script();
output.OpDrop(input); // Do "drop" right now
It is clear that this is simpler, especially when there are no other higher-level programming facilities available. And note that even when there is an underlying VM, these operation calls could cache the instructions so that the sequence of operations invoked so far could be retrieved in instruction-stream format. (However to do that, we’d need to add back in the “stack” object from before, and add a method to retrieve the instruction cache from the script object.)
This doesn’t happen if we separate the VM instructions from the script that creates and invokes them. Once that is done, we can remove the extraneous verbiage required by the programming language to embed the script. So instead of
we have simply:
Also by defining a VM, there is a clean, traditional separation between frontend programming languages and backend VM implementations. It defines an API that makes it easier for Blockstack developers to provide alternate programming languages or alternate backend interpreters.
Wire Format vs. VM
The upshot of the last section is basically that I believe Blockstack needs to define a VM.
And once this architectural decision is made, the work could be started right away, without completely nailing down the Blockstack-specific VM instructions. Those could be added as experience demands, and the implementation could be adjusted as well.
Initially, I thought that Blockstack’s “wire format” could be slotted into VM instructions. It is close, but not quite there. The fixed string “id” (called “magic”) should be dropped. There are distinct instructions like
NAME_RENEWAL that have the same opcode, “?”. And metadata for the program should be added, like an API version number.
Again following the principle of doing things the Bitcoin Script way unless there is reason not to, I’d suggest starting out with Bitcoin’s VM, extending the opcode size and putting the Blockstack-specific instructions in the space added by increasing the opcode size.
Having done this, there is no reason I can think of why the wire format and the instruction format can’t be virtually the same.
There are a number of Bitcoin Script VM opcodes that are obsolete, and there may be some that aren’t needed. As I mentioned before, there are some valid opcodes that the Ivy compiler will never generate, since there is no need to use them for the kinds of contracts Ivy currently supports.
Similarly, even though Bitcoin’s Script format with the Blockstack additions could be the base VM description, I don’t believe that all of the Bitcoin opcodes need to be implemented. As with Ivy, opcodes could implemented as the need arises.
Again, following the principle of no undue originality, there are already 3 languages on which to base a new one: Bitcoin Script, Ivy, and one I just learned of - Russell O’Connor’s Simplicity. The first two are not mutually exclusive since Ivy compiles down to Bitcoin Script. However Simplicity is a little different.
I think the goal of Simplicity is to have nice theoretical properties. In particular it:
Provide[s] formal semantics that facilitate easy reasoning about programs using existing off-the-shelf proof-assistant software.
Specifically, formal denotational semantics of the language are provided using Coq in Russell’s paper. In order to compute bounds on operational costs, an abstract machine called the “Bit machine” also has to be defined; and that is provided, too. As the name “Bit machine” suggests, the VM works on bits as opposed to the 32-bit integers of Bitcoin Script. Presumably hex strings would work the same way.
In addition to Russell’s implementation of the Bit machine, there is another rudimentary implementation that Dan Robinson wrote. As the title of Dan’s article says, his implementation is 30 lines of Haskell. However the 30 lines don’t fill out what would be needed in practice. For example there isn’t a 32-bit word type defined, let alone an “add” operation for it. Presumably users need to add a library to provide these functions. The most complex example Dan gives is a 4-by-4 bit addition.
Dan changed some of Simplicity’s terminal-unfriendly symbols to make them terminal friendly. There were some other cosmetic changes to make Simplicity more Haskell friendly. Again integers are bit composites. So a number like 32-bit number 3 is (zero) (zero) … (one) (one). where “zero” in the Haskell implementation is Left() and one is Right(). Presumably hex strings would be handled similarly. These may make sense in theory but might be problematic for developers to work with; I don’t know.
That said, if there is a higher level language like Ivy that compiles to Simplicity, then it matters a lot less what the compiled-down language and interpreter are, as long as they are not unduly slow. Russell argues that an interpreter he has written in C is fast.
Program and System Correctness
So let me close with the focus again on formal correctness. In Russell’s paper and possibly other papers as well, I see there is a well established precedent for proving program correctness and giving a formal definition of semantics. Russell uses Coq, which I gather is not uncommon. The references of Russell’s paper also mention the work of Andrew Appel from Princeton on a cryptographic hash correct. I’ve seen similar work in the literature to limit the error bounds of a particular implementation of floating point operations.
This is great so far as it goes. But in Blockstack’s case you’d need to prove that a “namerevoke” instruction revokes a name. I’m not sure how useful Coq would be useful here. Any comments?
Also, as I said previously, there are system properties or the interaction of various components of the system that are outside the range of validating a single program. These are things like fairness and starvation, possibly conservation (or at any rate, no loss) of bitcoins in the system.
The Simplicity language suggests that by using its formal systems, it is easy to reason about computational costs and finite termination. While that may be true, it isn’t strictly necessary. There are simple ways to prove computation costs and finite termination for conventional programs.
But when the individual instructions involve database transactions with possibly interoperable storage layers or interoperable database technologies using, say, the implementation of the name operations, then things get a lot fuzzier when it comes to formal proof and putting strict caps on time bounds. If you can fix the Blockstack name operation to say, an in-memory read from finite storage, then life is good; but is that the case in Blockstack’s implementation? And is that going to be guaranteed?