Arithmetic Opcodes: What Could They Look Like?
As noted in my previous post on Dealing with Amounts in Bitcoin Script, it’s possible to use the current opcodes to deal with satoshi values in script, but it’s horribly ugly and inefficient.
This makes it obvious that we want 64-bit-capable opcodes, and it makes sense to restore disabled opcodes such as OP_AND
/OP_OR
/OP_XOR
/OP_INVERT
, as well as OP_LSHIFT
, OP_RSHIFT
and maybe OP_LEFT
, OP_RIGHT
and OP_SUBSTR
.
Blockstream’s Elements Arithmetic Opcodes
Blockstream’s Elements codebase is ahead here, with a full set of 64-bit opcodes available for Liquid. All these operations push a “success” flag on the stack, designed that you can follow with an OP_VERIFY
if you want to assert against overflow. They then provide three conversion routines, to convert to and from CScriptNum, and one for converting from 32-bit/4-byte values.
But frankly, these are ugly. We need more than 31 bits for satoshi amounts, and we may never need more than 64 bits, but the conversions and new opcodes feel messy. Can we do better?
Generic Variable Length Amounts
Script amounts are variable length, which is nice when space matters, but terribly awkward for signed values: using the high bit of the last byte, this being little-endian, requires a 0 padding byte if the high bit would otherwise be set. And using negative numbers is not very natural (pun intended!), since we have OP_SUB
already.
What if we simply used unsigned, variable-length little-endian numbers? These are efficient, retain compatibility with current unsigned Script numbers, and yet can be extended to 64 bits or beyond, without worrying about overflow (as much).
I propose OP_ADDV
, which simply adds unsigned two little-endian numbers of arbitrary length. 64 bits is a little simpler to implement, but there are a number of useful bit tricks which can be done with wide values and I can’t see a good reason to add a limit we might regret in future.
OP_SUBV
would have to be Elements-style: pushing the absolute result on the stack, then a “success” flag if the result isn’t negative (effectively, a positive sign bit).
Limitations on Variable Operations
If OP_CAT
is not limited to 520 bytes as per OP_CAT beyond 520 bytes, then we could make OP_ADDV
never fail, since the effective total stack size as per that proposal could not be increased by OP_ADDV
(it consumes its inputs). But that introduces the problem of DoS, since on my laptop 2 million iterations (an entire block) of a mildly optimized OP_ADDV
of 260,000 bytes would take 120 seconds.
Thus, even if we were to allow more then 520 byte stack elements, I would propose either limiting the inputs and outputs to 520 bytes, or simply re-using the dynamic hash budget proposed in that post for arithmetic operations.
Multiply And Divide
Restoring OP_MUL
to apply to variable values is harder, since it’s O(N^2) in the size of the operands: it performs multiple shifts and additions in one operation. Yet both this and OP_DIV
are useful (consider the case of calculating a 1% fee).
I suggest this is a good case for using an Elements-style success flag, rather than aborting the script. This might look like:
- Normalize the inputs to eliminate leading zeros.
- If either input exceeds 128 bits, push
0
0
. - For
OP_DIV
, if the divisor is 0, push0
0
. - For
OP_MUL
, if the output overflows, push0
0
. - Otherwise the (normalized) result and
1
.
Both GCC and clang support an extension for 128 bit operations via __uint128_t
, so this implementation is fairly trivial (OP_MUL
overflow detection is assisted greatly by __builtin_mul_overflow
extension in GCC and clang).
Shift Operations, Splitting and Comparison
OP_LSHIFT
and OP_RSHIFT
would now be restored as simple bit operations (there’s no sign bit), treating their argument as unsigned rather than a signed amount. OP_LSHIFT
would fail if the result would be excessive (either a 520 byte limit, or a dynamic hash budget limit). Neither would normalize, leaving zero bytes at the front. This is useful when combined with OP_INVERT
to make a mask (0
OP_ADDV
can be used to normalize if you want).
OP_LEFT
, OP_SUBSTR
and OP_RIGHT
should probably return empty on out-of-range arguments rather than failing (I’m actually not sure what the original semantics were, but this is generally sensible).
OP_EQUAL
just works, but we need at least a new OP_GREATERTHANV
, and I suggest the full complement: OP_GREATERTHANOREQUALV
, OP_LESSTHANV
and OP_LESSTHANOREQUALV
.
Use for Merkle Tree Construction.
Regretfully, our comparison opcodes don’t work as OP_LESS
which is required for Merkle Tree construction to examine scripts: the SHA256 hashes there need to be compared big-endian, not little endian! So I suggest OP_BYTEREV
: this adds some overhead (reverse both merkle hashes to compare them), but is generally a useful opcode to have anyway.
Summary
We can extend current bitcoin script numbers fairly cleanly by having new opcodes which deal with variable-length little-endian unsigned numbers. The limits on most of these can be quite large, but if we want excessive limits (> 520 bytes) we need to have a budget like the checksig budget.
We can deal easily with current CScriptNum numbers, 32-bit numbers used by nLocktime, and 64-bit numbers used by satoshi amounts.
The soft fork would restore the following opcodes:
- OP_LSHIFT[^simplified]
- OP_RSHIFT[^simplified]
- OP_AND
- OP_OR
- OP_XOR
- OP_INVERT
- OP_MUL[^extended]
- OP_DIV[^extended]
- OP_LEFT
- OP_RIGHT
- OP_SUBSTR
And add the following new ones:
- OP_ADDV: add two little-endian unsigned numbers
- OP_SUBV: sub two little-endian unsigned numbers, push non-negative flag
- OP_GREATERTHANV: compare two little-endian unsigned numbers
- OP_GREATERTHANOREQUALV: compare two little-endian unsigned numbers
- OP_LESSTHANV: compare two little-endian unsigned numbers
- OP_LESSTHANOREQUALV: compare two little-endian unsigned numbers
- OP_BYTEREV: reverse bytes in the top stack element
This would make it far easier to deal with numeric fields to bring covenant introspection to its full potential (e.g. OP_TXHASH
).
[^simplified] The original versions preserved sign, but we don’t have that. [^extended] These apply to unsigned numbers up to 128 bits, not just signed 31 bit values as the originals did.