lua: stop trying to tweak lua's GC
TL;DR: I believe all lua_gc() calls stemmed from misunderstanding lua documentation, and the current settings seem potentially dangerous.
First, let me rely on lua 5.1 docs, as luajit 2 is documented to have done only minor changes in the GC. http://www.lua.org/manual/5.1/manual.html#lua_gc http://wiki.luajit.org/New-Garbage-Collector#rationale
Commit 5a709411 claims to have increased the speed of GC to 400 % of speed of allocation, but LUA_GCSETSTEPMUL is the parameter that controls that, and that one was lowered to 99 % and later in 0ee2d1d7 even to 50 %. Documentation explicitly says that setting the value under 100 % may cause problems.
The default values seem perfectly sane to me and currently I can't see any particular reason to change them. It's 200 % relative GC speed, and waiting for allocated size to double before starting another cycle.
I assume the resulting possibility of GC being too slow caused the need to explicitly force a non-incremental GC cycle once in a while, but that seems not useful anymore and not good for latency.
Merge request reports
Activity
I'm fairly confident on the point, but we might still want to wait for performance measurement before merging.
/cc @vavrusam, just in case.
@jholusa: Could you do some measurements here? We might need more complicated scenario here (like with some policies) to exercise lua gc more. @vcunat Could you rebase on current master and propose a config to test for comparion of this and
master
?At least the
malloc_trim(0);
that you removed is somewhat important if you expect it to release memory. Why did you do this change? There's a lot of assumptions. The main reason why you want to avoid GC runs in callbacks is that it aborts traces, that are already very short. Running GC periodically (adjusted by memory pressure) decouples it from input event rate, which is useful just like in game engines because it allows caller to say when it is safe/best to GC.@ondrej ok, we will find out with Vlada how to test it.
I'll first explain myself in more detail. @vavrusam
Lua memory
Doing a non-incremental GC once in 100k requests seemed (i) a bit ad-hoc, as relation to the amount of memory allocation there is rather loose (depends on plugin setup), and (ii) a potential for a huge latency spike at that moment. The (ii) point depends on your preference, as non-incremental GC will likely have better throughput and maybe even better average latency.
If I (heuristically) knew that now there's a bit of time when no requests will be processed and GC is likely to free nontrivial amount of memory, that would be a great moment to fire a full GC cycle, but once in 100k requests didn't seem to be a very good strategy.
In any case, I certainly want to benchmark this before switching anything, e.g. to see how big influence on throughput can we expect.
C memory
C memory is released automatically.
man malloc_trim
:This function is automatically called by free(3) in certain circumstances [...]
man mallopt
provides more information about this, especially aroundM_TRIM_THRESHOLD
. The strategy is configurable, and it seems to me that at least on Linux/glibc it's nothing trivial that I could beat it in a few workdays. In particular, I believe releasing all pages to the system isn't optimal, because memory usage fluctuates and getting/releasing pages isn't very cheap. IMO it's better to start by tweaking that settings instead of hand-managing it. So far I've seen no indication that the (glibc) default is somehow bad for our use case.To do the lua GC more precisely, we could pause the incremental collector and plan our own "bigger incremental steps" – by following rise of
LUA_GCCOUNT
(the amount of lua memory) and usingLUA_GCSTEP
to perform steps of whatever size we choose. That would seem viable, e.g. every time a task finishes, re-evaluate the GC plan and sweep a suitable chunk of memory.Sure, the memory management is not perfect and for example replacing
malloc_trim()
withM_TRIM_THRESHOLD
would probably be better. I think it should happen in this change, so you can observe before/after. You can observe what happens when you load it with request and then stop. The default free strategy will almost never lower the rss, which is good for servers, but not on personal machines or routers where people still care about the memory footprint. Part of that is just marketing, like the jokes with Chrome consuming terabytes of RAM.The modules that run often, like
policy
, are now reworked to avoid allocations and the Lua/C inteface, so it won't probably matter that much. Last time I was working on it, it didn't create much garbage to begin with, but enough to abort the already short traces. Maybe it has changed now. LJ is good at hoisting GC calls out of loops and such, but with the short callbacks there are none. One idea I had was to amalgamate all the Lua code that would run for each callback and then call the whole Lua code block at one, but it would be tricky to maintain module call order with mixed Lua and C modules without some control flow inversion (C modules called from Lua).@vavrusam: what kind of traces are you referring to? Those you get from
lua_error
calls?No, traces of compiled code. LJ is a trace compiler. It falls back to interpreter whenever it has to do stuff that breaks the rules, like GC, Lua/C, branches etc. Check out
jit.p
or this old blog post or how to profile compiled/interpreted code or Javier's last presentation.added performance label
changed milestone to %2018 Q3
assigned to @ljezek
@ljezek Please have a look once you have benchmarking infrastructure running. It will be necessary to also monitor memory usage so we can compare behavior before and after some of these changes.
added 3022 commits
-
f0ca89ac...9fa9df98 - 3021 commits from branch
master
- 12e5606c - lua: stop trying to tweak lua's GC
-
f0ca89ac...9fa9df98 - 3021 commits from branch