Arithmetic Opcodes: What Could They Look Like?

As noted in my previous post on Dealing with Amounts in Bitcoin Script, it’s possible to use the current opcodes to deal with satoshi values in script, but it’s horribly ugly and inefficient.

This makes it obvious that we want 64-bit-capable opcodes, and it makes sense to restore disabled opcodes such as OP_AND/OP_OR/OP_XOR/OP_INVERT, as well as OP_LSHIFT, OP_RSHIFT and maybe OP_LEFT, OP_RIGHT and OP_SUBSTR.

Blockstream’s Elements Arithmetic Opcodes

Blockstream’s Elements codebase is ahead here, with a full set of 64-bit opcodes available for Liquid. All these operations push a “success” flag on the stack, designed that you can follow with an OP_VERIFY if you want to assert against overflow. They then provide three conversion routines, to convert to and from CScriptNum, and one for converting from 32-bit/4-byte values.

But frankly, these are ugly. We need more than 31 bits for satoshi amounts, and we may never need more than 64 bits, but the conversions and new opcodes feel messy. Can we do better?

Generic Variable Length Amounts

Script amounts are variable length, which is nice when space matters, but terribly awkward for signed values: using the high bit of the last byte, this being little-endian, requires a 0 padding byte if the high bit would otherwise be set. And using negative numbers is not very natural (pun intended!), since we have OP_SUB already.

What if we simply used unsigned, variable-length little-endian numbers? These are efficient, retain compatibility with current unsigned Script numbers, and yet can be extended to 64 bits or beyond, without worrying about overflow (as much).

I propose OP_ADDV, which simply adds unsigned two little-endian numbers of arbitrary length. 64 bits is a little simpler to implement, but there are a number of useful bit tricks which can be done with wide values and I can’t see a good reason to add a limit we might regret in future.

OP_SUBV would have to be Elements-style: pushing the absolute result on the stack, then a “success” flag if the result isn’t negative (effectively, a positive sign bit).

Limitations on Variable Operations

If OP_CAT is not limited to 520 bytes as per OP_CAT beyond 520 bytes, then we could make OP_ADDV never fail, since the effective total stack size as per that proposal could not be increased by OP_ADDV (it consumes its inputs). But that introduces the problem of DoS, since on my laptop 2 million iterations (an entire block) of a mildly optimized OP_ADDV of 260,000 bytes would take 120 seconds.

Thus, even if we were to allow more then 520 byte stack elements, I would propose either limiting the inputs and outputs to 520 bytes, or simply re-using the dynamic hash budget proposed in that post for arithmetic operations.

Multiply And Divide

Restoring OP_MUL to apply to variable values is harder, since it’s O(N^2) in the size of the operands: it performs multiple shifts and additions in one operation. Yet both this and OP_DIV are useful (consider the case of calculating a 1% fee).

I suggest this is a good case for using an Elements-style success flag, rather than aborting the script. This might look like:

Normalize the inputs to eliminate leading zeros.
If either input exceeds 128 bits, push 0 0.
For OP_DIV, if the divisor is 0, push 0 0.
For OP_MUL, if the output overflows, push 0 0.
Otherwise the (normalized) result and 1.

Both GCC and clang support an extension for 128 bit operations via __uint128_t, so this implementation is fairly trivial (OP_MUL overflow detection is assisted greatly by __builtin_mul_overflow extension in GCC and clang).

Shift Operations, Splitting and Comparison

OP_LSHIFT and OP_RSHIFT would now be restored as simple bit operations (there’s no sign bit), treating their argument as unsigned rather than a signed amount. OP_LSHIFT would fail if the result would be excessive (either a 520 byte limit, or a dynamic hash budget limit). Neither would normalize, leaving zero bytes at the front. This is useful when combined with OP_INVERT to make a mask (0 OP_ADDV can be used to normalize if you want).

OP_LEFT, OP_SUBSTR and OP_RIGHT should probably return empty on out-of-range arguments rather than failing (I’m actually not sure what the original semantics were, but this is generally sensible).

OP_EQUAL just works, but we need at least a new OP_GREATERTHANV, and I suggest the full complement: OP_GREATERTHANOREQUALV, OP_LESSTHANV and OP_LESSTHANOREQUALV.

Use for Merkle Tree Construction.

Regretfully, our comparison opcodes don’t work as OP_LESS which is required for Merkle Tree construction to examine scripts: the SHA256 hashes there need to be compared big-endian, not little endian! So I suggest OP_BYTEREV: this adds some overhead (reverse both merkle hashes to compare them), but is generally a useful opcode to have anyway.

Summary

We can extend current bitcoin script numbers fairly cleanly by having new opcodes which deal with variable-length little-endian unsigned numbers. The limits on most of these can be quite large, but if we want excessive limits (> 520 bytes) we need to have a budget like the checksig budget.

We can deal easily with current CScriptNum numbers, 32-bit numbers used by nLocktime, and 64-bit numbers used by satoshi amounts.

The soft fork would restore the following opcodes:

OP_LSHIFT[^simplified]
OP_RSHIFT[^simplified]
OP_AND
OP_OR
OP_XOR
OP_INVERT
OP_MUL[^extended]
OP_DIV[^extended]
OP_LEFT
OP_RIGHT
OP_SUBSTR

And add the following new ones:

OP_ADDV: add two little-endian unsigned numbers
OP_SUBV: sub two little-endian unsigned numbers, push non-negative flag
OP_GREATERTHANV: compare two little-endian unsigned numbers
OP_GREATERTHANOREQUALV: compare two little-endian unsigned numbers
OP_LESSTHANV: compare two little-endian unsigned numbers
OP_LESSTHANOREQUALV: compare two little-endian unsigned numbers
OP_BYTEREV: reverse bytes in the top stack element

This would make it far easier to deal with numeric fields to bring covenant introspection to its full potential (e.g. OP_TXHASH).

[^simplified] The original versions preserved sign, but we don’t have that. [^extended] These apply to unsigned numbers up to 128 bits, not just signed 31 bit values as the originals did.