# OP_CAT beyond 520 bytes

The original `OP_CAT`

proposal limited the result to be 520 bytes, but
we want more for covenents which analyze scripts (especially since
`OP_CAT`

itself makes scripting more interesting).

My post showed that it’s fairly simple to allow larger sizes (instead of a limit of 1000 stack elements each with a maximum of 520 bytes each, we reduce the element limit for each element over 520 bytes such that the total is still capped the same).

## Hashing of Large Stack Objects

But consider hashing operations, such as `OP_SHA256`

. Prior to `OP_CAT`

,
it can only be made to hash 520 bytes (9 hash rounds) with three
opcodes:

```
OP_DUP
OP_SHA256
OP_DROP
```

That’s 3 hash rounds per opcode. With `OP_CAT`

and no 520 byte stack
limit we can make a 260,000 byte stack element, and that’s 4062 hash
rounds, or 1354 per opcode, which is 450x as expensive so we
definitely need to think about it!

A quick benchmark shows OpenSSL’s sha256 on my laptop takes about 115 microseconds to hash 260k: a block full of such instructions would take about 150 seconds to validate!

So we need to have some limit, and there are three obvious ways to do it:

- Give up, and don’t allow
`OP_CAT`

to produce more than 520 bytes. - Use some higher limit for
`OP_CAT`

, but still less than 260,000. - Use a weight budget, like BIP-342 does for checksig.

A quick benchmark on my laptop shows that we can hash about 48k (using the OpenSSL routines) in the time we do a single ECDSA signature verification (and Taproot charges 50 witness weight for each signature routine).

A simple limit would be to say “1 instruction lets you hash about 1k”
and “the tightest loop we can find is three instructions” and so limit
`OP_CAT`

to producing 3k. But that seems a little arbitrary, and still
quite restrictive for future complex scripts.

## My Proposal: A Dynamic Limit for Hashing

A dynamic BIP-342-style approach would be to have a “hashing budget” of some number times the total witness weight. SHA256 uses blocks of 64 bytes, but it is easier to simply count bytes, and we don’t need this level of precision.

I propose we allow a budget of 520 bytes of hashing for each witness byte: this gives us some headroom from the ~1k measurement above, and cannot make any currently-legal script illegal, since the opcode itself would allow the maximal possible stack element.

This budget is easy to calculate: 520 times total witness weight, and
would be consumed by every byte hashed by `OP_RIPEMD160`

, `OP_SHA1`

,
`OP_SHA256`

, `OP_HASH160`

, `OP_HASH256`

. I’ve ignored that some of these
hash twice, since the second hash amounts to a single block.

## Is 520 Bytes of hashing per Witness Weight Too Limiting?

Could this budget ever be limiting to future scripts? Not for the case of “your script must look like {merkle tree of some template}” since the presence of the template itself weighs more than enough to allow the hashing. Similarly for merkle calculations, where the neighbor hashes similarly contribute more than enough for the hash operations.

If you provide the data you’re hashing in your witness, you can’t
reasonably hit the limit. One could imagine a future `OP_TX`

which let
you query some (unhashed) witness script of (another) input, but even
in this case the limit is generous, allowing several kilobytes of hashing.

## What Other Opcodes are Proportional to Element Length?

Hashing is the obvious case, but several other opcodes work on
arbitrary length elements and should be considered. In particular,
`OP_EQUAL`

, `OP_EQUALVERIFY`

, the various DUP opcodes, and `OP_CAT`

itself.

I would really need to hack bitcoind to run exact tests, but modifying my benchmark to do a memcmp of two 260,000 byte blobs takes 3100ns, and allocating and freeing a copy takes 3600.

The worst case seems to be arranging for 173k on the stack then repeatedly doing:

```
OP_DUP
OP_DUP
OP_EQUALVERIFY
```

4MB of this would take about 8.9 seconds to evaluate on my laptop. Mitigating this further would be possible (copy-on-write for stack objects, or adding a budget for linear ops), but 10 seconds is probably not enough to worry about.