performance: lua-related improvements
This probably needs another review pass. In particular the written percentages might be inaccurate, as I found too late that the variance of the measurements I was doing was relatively high.
I squashed-in a change needed after !855 (merged) (and didn't re-measure). Overall it feels like lua code still eats a bit too much CPU in default setup, so I expect to try improve this later.
Edited by Petr Špaček