Since I was creating large blocks (41662 transactions), I added a little code to time how long they take once received (on my laptop, which is only an i3).
The obvious place to look is CheckBlock: a simple 1MB block takes a consistent 10 milliseconds to validate, and an 8MB block took 79 to 80 milliseconds, which is nice and linear. (A 17MB block took 171 milliseconds).
Weirdly, that's not the slow part: promoting the block to the best block (ActivateBestChain) takes 1.9-2.0 seconds for a 1MB block, and 15.3-15.7 seconds for an 8MB block. At least it's scaling linearly, but it's just slow.
So, 16 Seconds Per 8MB Block?
I did some digging. Just invalidating and revalidating the 8MB block only took 1 second, so something about receiving a fresh block makes it worse. I spent a day or so wrestling with benchmarking...
Indeed, ConnectTip does the actual script evaluation: CheckBlock() only does a cursory examination of each transaction. I'm guessing bitcoin core is not smart enough to parallelize a chain of transactions like mine, hence the 2 seconds per MB. On normal transaction patterns even my laptop should be about 4 times faster than that (but I haven't actually tested it yet!).
So, 4 Seconds Per 8MB Block?
But things are going to get better: I hacked in the currently-disabled libsecp256k1, and the time for the 8MB ConnectTip dropped from 18.6 seconds to 6.5 seconds.
So, 1.6 Seconds Per 8MB Block?
I re-enabled optimization after my benchmarking, and the result was 4.4 seconds; that's libsecp256k1, and an 8MB block.
Let's Say 1.1 Seconds for an 8MB Block
This is with some assumptions about parallelism; and remember this is on my laptop which has a fairly low-end CPU. While you may not be able to run a competitive mining operation on a Raspberry Pi, you can pretty much ignore normal verification times in the blocksize debate.
 I turned on -debug=bench, which produced impenetrable and seemingly useless results in the log.
So I added a print with a sleep, so I could run perf. Then I disabled optimization, so I'd get understandable backtraces with perf. Then I rebuilt perf because Ubuntu's perf doesn't demangle C++ symbols, which is part of the kernel source package. (Are we having fun yet?). I even hacked up a small program to help run perf on just that part of bitcoind. Finally, after perf failed me (it doesn't show 100% CPU, no idea why; I'd expect to see main in there somewhere...) I added stderr prints and ran strace on the thing to get timings.