Early Blockstack VM design notes


#1

@rocky from our Slack channel shared this a few days back. I felt it was worth sharing here :slight_smile:

Writing Reverse Polish scripts sort of sucks, but to start off with it's what Bitcoin script does, so it isn't that intolerable by folks who do Bitcoin scripting.
The Etherium scripting language, Solidity, is much nicer and it would not be too hard to do something like that which compiles to the Bitcoin script VM. But this is separable.
Coq is cool, and seems like a good thing to prove an implementation has desirable characteristics, but that too is separable.
And incremental.
There is no prototyping benefit for proving the implementation correct and it bogs things down especially when there isn't a clear spec for either the VM or the language for that.
I think if that by verifying that the each VM opcode is safe, and that the interpreter loop safe and bounded. Then that shows that any sequence of valid VM opcodes is safe.
We might also have to prove some characteristics for parsing to VM, e.g. that parsing doesn't create an infinite loop. But would be part of a phase only after the VM works (minimally), and after the parser works(minimally) which is even after that. And again this is separable from everything else.
Computing max time and max memory limits fis pretty easy, after each opcode has time amd memory requirements associated with it.
The AST for the program is a tree, and one can compute requirements with a simple traversal of the tree.
This too is separable and can be done later.
Ok so now down to the meat of what needs to be done first.
First step that would be useful is nailing down additional VM instructions. It looks like there is free opcode space in the bitcoin VM spec. Generally I prefer to build on existing work rather than reinventing from scratch, except when there is good reason to redo.
If I have it correct, additional opcodes to Bitcoin script would be:


opcode      inputs                        output stacked
---------------- -----------------------------------------------   ---------------
OP_NAME_REGISTER: <name> <value> <pay_account_id> <zonefile_hash>   ? boolean
OP_NAME_UPDATE:  <name> <zonefile hash>                ? boolean

OP_NAME_TRANSFER: <name> <account_id>                 ? boolean
OP_ACCT_SEND:   <value> <account_id>                 ? boolean
OP_NAME_METADATA: <name> <metadata>                  ?
OP_STORE_DATA:  <data>                        ?
OP_MINE_BLOCK:  <hash> <account_id> <coinbase value> <scratch area> ?
OP_FIELD_ZONEFILE_HASH: <id>                      hash
OP_LOADNAME    <hashed id>                     ??
OP_FIELD_METADATA: <data>                       field metadata
OP_HASH_LSTACK  ?                          ?

For those above that are correct, it would be great to describe the inputs, outputs and what each opcode does with (when possible) links to other blockstack documentation. Something akin to https://en.bitcoin.it/wiki/Script#Crypto

Keep in mind that the behaviour might include an immediate failure with no stack output instead of a boolean unconditionally.

How does one capture time and resources for something like OP_MINE_BLOCK? Is the data in OP_STORE_DATA unlimited? If so one will have an unbounded limit for that. So a rough (with emphasis on "rough") idea of the time and memory resources for these operations would be nice.
For the various hashing code, we'll also have to fill something like that in. Eventually.
Also, it would be good to prioritize opcodes in those that would have the most benefit for a prototype. If there are just one or two opcodes that will show a lot off that'd be great.
I started modifying The Bitcoin IDE to add an opcode. (I added OP_HASH0 which the identity hash and is useful in testing scripts) As an IDE for testing scripts it is fine, but one needs to consult the Bitcoin Script reference implementation as well, since I found the semantics can differ. OP_EQUALVERIFY in the IDE is identical to OP_EQUAL which is incorrect as per the reference implementation since the former should raise an invalid transaction and terminate while the latter doesn't'.
But for the real thing, one wouldn't use this. Personally, initially I'd probably prototype in Python as that has to interface easily in a number of ways with
blockstack which is also in Python. But at some point I'd switch to a compiled language (Rust and Go are the obvious choices), not just for speed and low runtime overhead and to allow execution in limited environments.

#2

I have been reading more about Ethereum, its programming languages and implementation and the problems that Ethereum has in its exploits, in particular via Solidity.

In light of this, what wrote is slightly wrong, or at least comes from being not very well informed, especially of Ethereum and of the problems it has had.

Some of the same things I mentioned have been thought of, and are implemented in Ethereum. That is not surprising in that when when you have a similar or same problem, of course the first ideas are probably going to be similar. For example, I mentioned the use of a VM and a couple of languages which could compile to that VM. The use of the several languages was for expedience: get something out there, even if it isn’t pretty, to get experience. And my guess is that Ethereum went through this process as well; that is why there is LLL, which is mentioned below.

According to A Gentle Introduction to Ethereum in the section that starts “Smart Contract languages”, there are 3 programming languages mentioned: Solidity, Serpent and LLL. Although LLL (which stands for Lisp-like Language) perhaps more low-level than the others (except that Serpent can apparently directly access VM instructions) it doesn’t use Reverse Polish notation as Bitcoin Script does. Bitcoin Script is a bit more low level.

Although I probably could go on a bit more, I won’t to keep this more focused. But there’s just one more important thing in the area of verification that I really should correct in light of what I now understand from say “Scanning Live Ethereum Contracts for Bugs” and the links off of that.

I wrote that if you had a VM that always did the right thing at the instruction level, then you could compose sequences of instructions that would do the right thing as long as it would terminate. Not so.

The King of the Ether and DAO vulnerabilities I think are about money getting lost as a result of contract programming bugs. Specifically a money is sent but for some reason there is an error so it doesn’t make it to the receiver so it no longer belongs either to the sender or receiver.

For these kinds of problems like ensuring conservation of money (as can be tracked inside a single program), one solution would be using a language that incorporates Typestate Anaysis.

Typestate can track ownership, creation and transfer of values, e.g. money.

So, for example, in a typestate system, if a money transfer failed in a send() call as a result of the receiver not receiving, there would be an exception thrown in the send() call, and the state of the variable which has the money would be put in a state where it still owns the money. If on the other hand the money is successfully transfered (no exception), then the variable parameter of the send would be in the state where it no longer has that money.


#3

Previously, lacking better understanding of Blockstack’s specific needs, I had been focusing on programming languages that blockchains use and their implementation, e.g. bitcoin, etherium, possibly symbiont, and moxiebox.

The last day or so, I’ve been delving into Blockstack, specifically the process one uses to define a Blockstack namespace. My aim was to wrap my head around what VM instructions or language would be involved in Blockstack for that. Jude informs:

Each Blockstack node is a replicated state machine, where the state machine encodes transitions as blockchain transactions. The “VM” in Blockstack right now “executes” specially-crafted Bitcoin transactions, which get parsed and validated by code in blockstack.lib.operations and “evaluated” in blockstack.lib.nameset.db. The “evaluation” translates the state-transitions that each transaction encodes into SQL commands. These in turn get executed on an internal database (encoding the VM’s state). The “VM” does not directly load or store data or interact with the network; it simply evaluates a sequence of instructions encoded within Bitcoin transactions.

The “next-gen” VM that will power STACKs blockchain will do likewise, but will be constructed such that transactions can be broadcast and processed in a much faster, lower-latency manner than block mining. The VM model is otherwise the same—it takes a sequence of instructions within STACKs blockchain transactions and evaluates them by updating an internal state database. Each STACKs peer does this, so that when they evaluate the same blockchain, they all produce the same state (which can then be queried by clients).

Blockstack nodes would run the hypothetical STACKs scripting language in order to validate transactions. blockstack-core would continue to provide legacy compatibility for clients who want to register names on the Bitcoin blockchain directly, and to help STACKs clients and would-be validators discover both the current set of validators and the history of validator elections (the latter of which would be used to authenticate previous chain history)

[STACKs is the working name for Blockstack’s blockchain]

I’ll come back to this in a little bit.

Yesterday, without too much trouble I went through the process to register a namespace. As folks no doubt know, the process is:

  1. “pre-order” namespace, incurring a non-refundable fee
  2. “reveal” namespace, incurring a non-refundable fee, after a little while — but not indefinitely
  3. “import” or seed zone (optional)
  4. “ready” the zone after a little while from the “reveal”— but not indefinitely

This already is a mouthful.

Previously, in discussing what a programming language prototype might do, it was suggested to implement a BNS version of cryptokitties. In light of what is in involved in fully registering a namespace, I think we should come back to that after getting some more fundamental operations sorted out. The added details I’m sure will easily follow after fundamentals are more fully fleshed out.

So back to namespace registration. The first step in my mind is to understand what would go into a Blockstack VM interpreter that supports blockstack operations. Well, “operations,” duh! Okay, but which operations?

From the information relayed by Jude, VM operations are already listed in blockstack/lib/operations/__init__.py as keys in the various dictionaries that end _METHODS.

Specifically there is NAMESPACE_PREORDER, NAMESPACE_REVEAL,NAMESPACE_READY, and NAME_IMPORT among others.

There is even a “wire format” defined to describe how to encode an instruction’s opcode and its operand(s). Just drop off the “magic” and “consensus hash” from what is described there, but use the magic once for the entire sequence.

The thing to note about all of the VM instructions listed here is that they access and possibly write to a database. Typically we don’t think of virtual machine instructions issuing database requests, but VM instructions sometimes do computationally intensive things. For example in Object-Oriented languages with VM’s, retrieving or calling an instance method of an object can be pretty involved.

Assessing cost of a database access can be a little tricky, as it might depend not only on the operation, but also both the kind of storage module and database used. I guess that will have to be discussed and negotiated down the line.

So far, I’ve been looking at things from the VM instruction side, not from the standpoint of someone writing something in Blockstack Script language which is magically run. And I’ll argue why I think a VM helps here. Some of this may be a bit basic, but bear with me.

Suppose I offer a DNS registration and management service and I want to register 1,000 BNS zones according a simple pattern. My understanding is, and this seems to be supported by what Jude writes, that I could right now hook into the Python blockstack library (blockstack-core).

Fuzzing some details, I might write Python code like:

    def register_namespace(args):
        info = preorder(args)
        wait_until_preorder_confirmed(info, args)
	register(args)
	wait_until_registered(args)
	# No import/populate
	ready(args)

Until such a language is written, as Jude mentioned, Blockstack developers have to write something like this; but even after scripting language is in place, this is useful in testing and prototyping.

We expect that over time the API may change – the functions and parameters may change. Having a simple one-to-one mapping of the API functions with the VM instructions helps keep things simple in testing and in tracking changes.

Finally, having talked a bit about a VM, let me close by talking about a programming language. The thing that strikes me about Blockstack’s namespace registration is that the first two steps are logically one. Of course, it has to be this way for a two-phase commit. And if the first phase of the two-phase commit fails, we might have separate logic to deal with that (which is not done above).

But a Blockstack user writing code would really rather not have to write what would undoubtedly be boilerplate code which could potentially be erroneous. And this is something a programming language can help with: the programming-language can construct a necessary sequence the right way, while returning detailed status if any one of the stages fails.

Let’s say there is a preorder_confirm()function in Blockstack scripting language that does the two operations. It is possible for a compiler to break this up into two execution sequences and schedule them separately. (I’m not sure if this is what we would want, but it is possible.)

Finally, having the entire logical task collected in one file listed sequentially (but possibly scheduled as several as different streams) is more convenient and easier to work and reason about with than four individual programs or tasks that may potentially be in four files.

Many thanks to Jude for his patience and clarification.


#4

Hi @rocky, thank you for your detailed write-up and programming language proposal!

I think it is worth pointing out that your post describes two computing paradigms:

  • Off-chain programs: This is what you described with your code example. An off-chain program’s job is to generate and broadcast transactions that encode instructions to be executed by Blockstack’s VM. The act of creating and populating a namespace can be achieved programmatically this way, which you pointed out. I use the term “off-chain” here to allude to the fact that the program instructions execute client-side–they are executed at a user’s behest, and the code is private (i.e. hosted only on the client). Off-chain programs generate Blockstack VM instructions and broadcast them to the blockchain. The Blockstack nodes are not aware of off-chain programs; they only see the transactions they generate.

  • On-chain programs: These are also known as “smart contracts”. Blockstack does not have any support for on-chain programs at this time, but the STACKs blockchain will at some point support them. As the name implies, an on-chain program lives within the blockchain. Its code is public and stored in every Blockstack node, and it can be programmatically invoked by a transaction sent from any client. An on-chain program is more like a “stored procedure” within the Blockstack VM—creating an on-chain program is like adding a new operation to blockstack.lib.operations, and running the on-chain program is like invoking one of these operations. Blockstack does not implement a way for clients to do this (i.e. adding new operations requires a hard fork), but the STACKs blockchain will support this feature natively.

Before going on, let me be clear: the following description of STACKs is an early design sketch and does not in any way indicate the final system design.

To give an example of an on-chain program that STACKs might one day support, one idea that’s been rattling around in my head is to give a namespace creator the ability to define a “name acceptance” program that defines which name operations will be allowed for names in the namespace. Before anyone asks, this can be done without a Turing-complete language (something more like Ivy).

Here’s how one might implement Cryptokitties using a hypothetical name-acceptance program. Let’s assume that the namespace .cryptokitties exists, and each name in this namespace represents a cat. Then, we’d want the name-acceptance program to only allow a name to be registered if all of the following are true:

  • The name is not already taken (i.e. no “cat clones” allowed)
  • The name’s NAME_REGISTRATION transaction has an associated value_hash (see the wire format). The value_hash would encode the cat’s genes.
  • No NAME_UPDATE transactions will be accepted (i.e. cats can’t change their genes)
  • The value_hash of a newly-registered name the hash of two previous names’ value_hash (i.e. each cat’s genes is a mixture of two parent cats). Call these the “parent names”
  • The “parent” names have to each have a certain age, and cannot have been used recently.

In STACKs, it should be possible for the .cryptokitties namespace creator to specify these constraints as an on-chain program when they create the namespace. STACKs would enforce these constraints by evaluating this on-chain program on each name operation on a .cryptokitties name. If all constraints are met, then the operation is accepted and processed. If not, then the operation is ignored.

What programming language facilities would an on-chain program need to accomplish something like this? A few come to mind:

  • Read fields for a name from the name database, such as value_hash
  • Read fields from a name’s history in the name database, such as the block height of the last transaction that affected a name.
  • Read fields of the transaction under consideration (i.e. “if transaction.op == NAME_UPDATE, then REJECT”)
  • Perform cryptographic operations on the transaction being considered (i.e. if transaction.value_hash != hash(parent1.value_hash, parent2.value_hash), then REJECT)

While Blockstack’s Python library gives a way to implement off-chain programs (albeit not in a particularly domain-specific way), there are currently no facilities to implement on-chain programs like the above. However, it is something we’d like to add in order to allow developers to extend the Blockstack VM’s consensus rules in application-specific ways.


#5

Ivy Notes

Table of Contents

Here are my thoughts on looking at and trying the Ivy Playground for Bitcoin.

Overall, I like it; and as I said, I think this kind of direction is the direction Blockstack should follow in providing a scripting language.

Below are good things about it as well as other issues or things that might need improving.

Ivy Advantages

There is helpful code structuring and sample contracts

As with Etherium’s Solidity, the top-level construct is a “Contract.” Within an Ivy contract there are clauses. Clauses are not found in Etherium’s Solidity.

Bitcoin Script is used underneath

Since Ivy compiles to Bitcoin Script, it extends the existing understanding and research found in the Bitcoin Script language, and easily too.

For example, you can create a SegWit compatible contract without having to know any details of SegWit other than it’s a good thing you might want to use. And since it is in a sample contract you can use it even if you don’t know that is desirable.

Ivy is not Turing Complete

Another consequence of the point that Ivy compiles to Bitcoin script may be worth mentioning on its own. Ivy, while is high-level, it is not Turing Complete. Etherium’s programming is Turing Complete; it has recursive function calls and loops, which can be a problem.

Ivy is a higher-level programming language so it expresses ideas in more high-level terms than Bitcoin Script

You don’t write in reverse polish notation as you do with Bitcoin Script. So “if” statements, expressions, and (built-in) function calls look as they do in a higher-level language.

As a result, Ivy programs convey more of what is going on, and more clearly to humans, than VM interpreters do.

But in this regard, I came across something amusing for its anomaly.

Usually in higher level languages one line or logical statement of source code will compile to many VM instructions. And that largely is true in Ivy. An exception, though, is the program you are presented with when you first enter the Ivy playground:

contract LockWithPublicKey(publicKey: PublicKey, val: Value) {
  clause spend(sig: Signature) {
    verify checkSig(publicKey, sig)
    unlock val
  }
}

The corresponding Bitcoin Script code for this is the following two Bitcoin Script instructions:

PUSH(publicKey) CHECKSIG

Of course, what is going on is that the extra verbiage comes as a result of showing structure, contract and parameter naming, and giving types to variables and a name to the contract.

Ivy is strongly typed

An additional benefit of the Ivy language and compiler is that type checking is performed on the code. This reduces the number of nonsensical contract programs that could be run.

Ivy is written in nodejs/typescript

Ivy is written in in TypeScript, a variant of JavaScript that adds enhancements to node, specifically type checking on JavaScript programs. The language implementation and the language implemented are both strongly typed.

Going further, the author, Dan Robinson, has a version of the Ivy-to-Bitcoin-Script compiler in PureScript.

Since Ivy compiles down to nodejs, it can take advantage of the interactivity generally found in the JavaScript/node ecosystem. It uses for example bcoin for its bitcoin implementation. It may also be easy to adapt to Blockstack’s existing blockstack.js code.

There is not only a compiler component but an interactive sandboxed environment component. Both are fast.

Ivy works so blindingly fast that when you make a change everything is checked; running a contract is fast too. (This implies that bcoin used underneath is fast.)

This is similar to and perhaps even better than Go’s interactive playground – compilation happens in the background as you are editing.

The same thing with running a contract. Inputs can be generated automatically or entered manually.

The environment allows you to start with any of the example contract programs. This makes getting a contract up and running easy.

However see below on information which describes where more could be done.

Extendable

In contrast to Bitcoin Script assembler implementations, Ivy uses a PEG parser
for nodejs called pegjs.

This not only allows a lot of expressiveness in the language, but also simplifies extending the language. PEG parsers are context-free grammars which are a little more general than LR (which is a superset of LALR(x) parsers like Bison or Yacc) or LL parsers.

Areas in which Ivy might improve

However having written the above, I don’t want to oversell Ivy. It is new, about 4 months old. And its newness still shows a little bit.

At present there is just one person, Dan Robinson, working on on this. I suspect that most of the things below are simply a result of lack of manpower.

Blockstack might be able to offer assistance here.

Educational status

It is explicitly stated that Ivy is not for production use. As stated (in bold) in the README:

Ivy is prototype software and is intended for educational purposes only. Do not attempt to use Ivy to control real Bitcoins.

In fact Ivy currently doesn’t have a way to issue transactions on a real blockchain, even on testnet. As stated in Issue #7.

We do hope to add support for creating testnet transactions, but it’s non-trivial to implement (particularly since, unlike the rest of the playground, which is a static web app, it will require a backend server), so I don’t have a firm deadline on when it will be
available.

There are some further discussion on ways that could be done in the issue.

Additional tools: Test framework? Debugger?

There’s no debugger like there is for Bitcoin Script. Right now, a debugger or tracer is not that necessary.

One the other hand there is an existing stepping tool for Bitcoin Script that is written in JavaScript so that can easily be added to this code.

It would be simple to add a conventional line-number table mapping scheme to map between Bitcoin Script and the Ivy source.

It might also be nice to extend the interactive environment or non-interactive environment to have some sort of automated testing.

One tiny thing that is simple to do and missing in just about every listing I’ve seen of Bitcoin Script is a pretty printer of the bytecode stream to show stack nesting. (Recall, Ivy compiles to Bitcode Script which is shown.)

So for example: 1 + (2 + 3) in Bitcode Script appears as:

1 2 PLUS 3 PLUS

Possibly a little nicer would be an option to show stack nesting as indentation option to display this as:

1               # Stack item pushed after instruction
  2             # ditto
    3           # ditto
      PLUS  # Two stack items popped, one pushed
  PLUS

And compare with (1 + 2) + 3:

1          # Stack item pushed after instruction
  2        # Stack item pushed after instruction
    PLUS   # Two stack items popped, one pushed
  3
  PLUS

But even this may take a little getting used to. Going further, it wouldn’t be hard to create a more conventional statement/expression parse of Bitcoin Script. For example 1 + (2 + 3). As a side project I may do that for Bitcoin Script some day as it is pretty simple to do.

Limited use of Bitcoin Script

Right now only those instructions that can be generated as a result of useful kinds of contracts are emitted from Ivy. So a limitation of Ivy is that you can’t use the full expressiveness that is in Bitcoin Script. And this is deliberate. Since there is no escape to Bitcoin Script, there will probably for always be useful contracts that cannot be expressed in Ivy.

In particular, missing from Ivy are any use of arithmetic or shift operations. See for example: https://github.com/ivy-lang/ivy-bitcoin/issues/5

Documentation is sparse.

There isn’t much that describes the language or rationale behind it. The language is simple enough that not much is probably needed. Some of the discussion raised the github’s issue tracker reports probably should be added to the documentation, though.

Setup from Git

Information on how to setup from Git is sparse too. To use the code from git in a local sandbox from github, I had a small number of problems. For one trivial code change, I have a pull request in (merged turnaround was about 5 days)

To run a local sandbox be aware that you will have to use pegjs to compile the grammar file. Process the source code with Typescript’s tsc command.


#6

Table of Contents

Introduction

Sorry for dropping off for a bit…

Recently in my desire to understand more of the issues surrounding languages for blockchain I read Dan Robinson’s Understanding Simplicity: Implementing a Smart-Contract Language in 30 Lines of Haskell. Linking from that I read Russell O’Connor’s Simplicity paper. That’s when I had my AHA! moment, why possibly the interest in Coq, and what’s already been done with that. I’ll come back to that in a bit.

I think that I now understand more about what Blockstack might need with respect to programming, what it has, what is being worked on, as well as what other languages for blockchains are out there. So I’d like to review the situation and suggest an approach. Maybe other people are thinking along similar lines and, if so, then I may be just merely making more explicit things that are already implicit. But please correct me where I’m wrong or where there are other thoughts, concerns and comments.

TL;DR: I compare Blockstack’s execution with bcoin’s and argue that Blockstack should define a VM, modeled on Bitcoin Script’s VM. I describe a little of the Simplicity language and argue that program proof and verification while desirable, isn’t a panacea for all of the issues Blockstack may encounter.

A programming language API

Let’s start with the ways we might write a program to perform Blockstack’s’ operations. Often there is a public API which gives the names of some functions or methods that developers would call.

Right now in Blockstack, I don’t see API documentation per se, but looking at the code I think those operations would be the methods in the modules blockstack_client.operations. For example there is blockstack_client.operations.register.make_transaction() or
blockstack_client.operations.revoke.make_transaction(). For each op.make_transaction() method there is a txop alias created for it. For example,
blockstack_client.operations.tx_register() is the same thing as blockstack_client.operations.register.make_transaction().

An important thing to note is that each of the make_transaction() methods calls a corresponding build() method. For example, there is blockstack_client.operations.revoke.build()

The build() method constructs and returns from its parameters a hex string in “wire format”. This hex string represents the operation that is to be performed. The wire format is documented. See https://github.com/blockstack/blockstack-core/blob/master/docs/wire-format.md

I like to draw parallels with how things are done in various places in the bitcoin world, and I’ll follow the general principle of doing things as close as possible to bitcoin, except where there is good reason not to.

So let’s take a look at how bcoin does something similar by looking at bcoin’s Intro to Scripting. Like Blockstack, things are changing in that world, so what I describe is subject to change and not to be construed as a polished solution.

Here is how you might start to create a bcoin script:

   const output = new Script();
   output.pushSym('OP_DROP');
   output.pushSym('OP_ADD');
   ...
   output.compile();
   const input = new Script();
   input.setString(0, 'hello world'); // add some metadata
   input.compile();
   const stack = new Stack();
   input.execute(stack);  // Put the results of running input into "stack"
   output.execute(stack);

OP_DROP and OP_ADD you can find as a BitCoin Script opcode.

There is a bit of a mismatch already in that in the Blockstack case make_transaction is much simpler: you don’t need to call additional compile()s, or executes or any stacks. In the bcoin()'s output.pushSym('OP_DROP'), the operation isn’t done immediately, but only in the process of performing output.execute(stack).

But let’s ignore those differences in order get across the major, if simple, point: in either Blockstack or bcoin there will be atomic instructions that the developer would like to perform. And the simplest, most basic way is just to write in the programming language of the core implementation (Python currently for Blockstack or JavaScript for bcoin) and call the function, whether it is tx_register(), the underlying register.build()in Blockstack, or`output.pushSym(‘OP DROP’) in bcoin.

There is an additional advantage when the reference implementation is written in a compiled language like C++, Rust, Go, or Java, and a library is created to encapsulate the API: there is interoperability with other languages. So register.build() could be called from not just Python but from other programming languages. This is typically referred to as writing “Bindings” for a library.

Transactions, Calls vs. VM

Having described an API for programming at its simplest and most primitive form, let us get back to the difference between what bcoin is doing and what Blockstack is currently doing: bcoin, like most bitcoin implementations, has defined a VM. Running instructions in the VM is typically the only way that developers can create “smart contracts.”

As in Blockstack, we don’t need a VM interpreting instructions; the calls themselves could suffice. And the effect of a call can take place immediately, in the case of off-chain execution. If this were implemented in bcoin, the above script could be simpler:

   const input = new Script(); // Note: setting input must come before output operations
   input.setString(0, 'hello world'); // add some metadata
   ...
   const output = new Script();
   output.OpDrop(input);      // Do "drop" right now
   output.OpAdd(input);

It is clear that this is simpler, especially when there are no other higher-level programming facilities available. And note that even when there is an underlying VM, these operation calls could cache the instructions so that the sequence of operations invoked so far could be retrieved in instruction-stream format. (However to do that, we’d need to add back in the “stack” object from before, and add a method to retrieve the instruction cache from the script object.)

But there is a serious conceptual problem with this, namely that the separation between the bitcoin script and the language the script is embedded in (JavaScript) is blurred. For example, const input = new Script(); is purely a JavaScript construct. Since we can intermingle other JavaScript statements that do not invoke bitcoin operations, conditional JavaScript code could determine what instructions/transactions get invoked. That is good for language flexibility, but when that happens, reasoning about the bitcoin portion of program is harder: we basically have to reason about both the bitcoin instructions that get emitted and the JavaScript program that created the program.

This doesn’t happen if we separate the VM instructions from the script that creates and invokes them. Once that is done, we can remove the extraneous verbiage required by the programming language to embed the script. So instead of

output.OpDrop(input);
output.OpAdd(input);

we have simply:

DROP; ADD;

Also by defining a VM, there is a clean, traditional separation between frontend programming languages and backend VM implementations. It defines an API that makes it easier for Blockstack developers to provide alternate programming languages or alternate backend interpreters.

Wire Format vs. VM

The upshot of the last section is basically that I believe Blockstack needs to define a VM.

And once this architectural decision is made, the work could be started right away, without completely nailing down the Blockstack-specific VM instructions. Those could be added as experience demands, and the implementation could be adjusted as well.

Initially, I thought that Blockstack’s “wire format” could be slotted into VM instructions. It is close, but not quite there. The fixed string “id” (called “magic”) should be dropped. There are distinct instructions like NAME_REGISTRATION and NAME_RENEWAL that have the same opcode, “?”. And metadata for the program should be added, like an API version number.

Again following the principle of doing things the Bitcoin Script way unless there is reason not to, I’d suggest starting out with Bitcoin’s VM, extending the opcode size and putting the Blockstack-specific instructions in the space added by increasing the opcode size.

Having done this, there is no reason I can think of why the wire format and the instruction format can’t be virtually the same.

There are a number of Bitcoin Script VM opcodes that are obsolete, and there may be some that aren’t needed. As I mentioned before, there are some valid opcodes that the Ivy compiler will never generate, since there is no need to use them for the kinds of contracts Ivy currently supports.

Similarly, even though Bitcoin’s Script format with the Blockstack additions could be the base VM description, I don’t believe that all of the Bitcoin opcodes need to be implemented. As with Ivy, opcodes could implemented as the need arises.

Higher-level Languages

Again, following the principle of no undue originality, there are already 3 languages on which to base a new one: Bitcoin Script, Ivy, and one I just learned of - Russell O’Connor’s Simplicity. The first two are not mutually exclusive since Ivy compiles down to Bitcoin Script. However Simplicity is a little different.

I think the goal of Simplicity is to have nice theoretical properties. In particular it:

Provide[s] formal semantics that facilitate easy reasoning about programs using existing off-the-shelf proof-assistant software.

Specifically, formal denotational semantics of the language are provided using Coq in Russell’s paper. In order to compute bounds on operational costs, an abstract machine called the “Bit machine” also has to be defined; and that is provided, too. As the name “Bit machine” suggests, the VM works on bits as opposed to the 32-bit integers of Bitcoin Script. Presumably hex strings would work the same way.

In addition to Russell’s implementation of the Bit machine, there is another rudimentary implementation that Dan Robinson wrote. As the title of Dan’s article says, his implementation is 30 lines of Haskell. However the 30 lines don’t fill out what would be needed in practice. For example there isn’t a 32-bit word type defined, let alone an “add” operation for it. Presumably users need to add a library to provide these functions. The most complex example Dan gives is a 4-by-4 bit addition.

Dan changed some of Simplicity’s terminal-unfriendly symbols to make them terminal friendly. There were some other cosmetic changes to make Simplicity more Haskell friendly. Again integers are bit composites. So a number like 32-bit number 3 is (zero) (zero) … (one) (one). where “zero” in the Haskell implementation is Left() and one is Right(). Presumably hex strings would be handled similarly. These may make sense in theory but might be problematic for developers to work with; I don’t know.

That said, if there is a higher level language like Ivy that compiles to Simplicity, then it matters a lot less what the compiled-down language and interpreter are, as long as they are not unduly slow. Russell argues that an interpreter he has written in C is fast.

Program and System Correctness

So let me close with the focus again on formal correctness. In Russell’s paper and possibly other papers as well, I see there is a well established precedent for proving program correctness and giving a formal definition of semantics. Russell uses Coq, which I gather is not uncommon. The references of Russell’s paper also mention the work of Andrew Appel from Princeton on a cryptographic hash correct. I’ve seen similar work in the literature to limit the error bounds of a particular implementation of floating point operations.

This is great so far as it goes. But in Blockstack’s case you’d need to prove that a “namerevoke” instruction revokes a name. I’m not sure how useful Coq would be useful here. Any comments?

Also, as I said previously, there are system properties or the interaction of various components of the system that are outside the range of validating a single program. These are things like fairness and starvation, possibly conservation (or at any rate, no loss) of bitcoins in the system.

The Simplicity language suggests that by using its formal systems, it is easy to reason about computational costs and finite termination. While that may be true, it isn’t strictly necessary. There are simple ways to prove computation costs and finite termination for conventional programs.

But when the individual instructions involve database transactions with possibly interoperable storage layers or interoperable database technologies using, say, the implementation of the name operations, then things get a lot fuzzier when it comes to formal proof and putting strict caps on time bounds. If you can fix the Blockstack name operation to say, an in-memory read from finite storage, then life is good; but is that the case in Blockstack’s implementation? And is that going to be guaranteed?


#7

Hello all,
I know this post is a little old but I recently started checking out this solidity tutorial and thought I’ll give it a whirl.
After looking at this post, I am skeptical as to whether I should be continuing with the course because all the issues you guys are facing.
If possible, could you guys also suggest an alternate contract-oriented programming language.
Also, how often do you guys face these issues?