lua: stop trying to tweak lua's GC

I'm fairly confident on the point, but we might still want to wait for performance measurement before merging.

/cc @vavrusam, just in case.

@jholusa: Could you do some measurements here? We might need more complicated scenario here (like with some policies) to exercise lua gc more. @vcunat Could you rebase on current master and propose a config to test for comparion of this and master?

At least the malloc_trim(0); that you removed is somewhat important if you expect it to release memory. Why did you do this change? There's a lot of assumptions. The main reason why you want to avoid GC runs in callbacks is that it aborts traces, that are already very short. Running GC periodically (adjusted by memory pressure) decouples it from input event rate, which is useful just like in game engines because it allows caller to say when it is safe/best to GC.

@ondrej ok, we will find out with Vlada how to test it.

I'll first explain myself in more detail. @vavrusam

Lua memory

Doing a non-incremental GC once in 100k requests seemed (i) a bit ad-hoc, as relation to the amount of memory allocation there is rather loose (depends on plugin setup), and (ii) a potential for a huge latency spike at that moment. The (ii) point depends on your preference, as non-incremental GC will likely have better throughput and maybe even better average latency.

If I (heuristically) knew that now there's a bit of time when no requests will be processed and GC is likely to free nontrivial amount of memory, that would be a great moment to fire a full GC cycle, but once in 100k requests didn't seem to be a very good strategy.

In any case, I certainly want to benchmark this before switching anything, e.g. to see how big influence on throughput can we expect.

C memory

C memory is released automatically. man malloc_trim:

This function is automatically called by free(3) in certain circumstances [...]

man mallopt provides more information about this, especially around M_TRIM_THRESHOLD. The strategy is configurable, and it seems to me that at least on Linux/glibc it's nothing trivial that I could beat it in a few workdays. In particular, I believe releasing all pages to the system isn't optimal, because memory usage fluctuates and getting/releasing pages isn't very cheap. IMO it's better to start by tweaking that settings instead of hand-managing it. So far I've seen no indication that the (glibc) default is somehow bad for our use case.

To do the lua GC more precisely, we could pause the incremental collector and plan our own "bigger incremental steps" – by following rise of LUA_GCCOUNT (the amount of lua memory) and using LUA_GCSTEP to perform steps of whatever size we choose. That would seem viable, e.g. every time a task finishes, re-evaluate the GC plan and sweep a suitable chunk of memory.

I think I'll leave this be for now, at least after 1.2.5 release, maybe even after 1.3.0. It feels lower-priority to me personally.

Sure, the memory management is not perfect and for example replacing malloc_trim() with M_TRIM_THRESHOLD would probably be better. I think it should happen in this change, so you can observe before/after. You can observe what happens when you load it with request and then stop. The default free strategy will almost never lower the rss, which is good for servers, but not on personal machines or routers where people still care about the memory footprint. Part of that is just marketing, like the jokes with Chrome consuming terabytes of RAM.

The modules that run often, like policy, are now reworked to avoid allocations and the Lua/C inteface, so it won't probably matter that much. Last time I was working on it, it didn't create much garbage to begin with, but enough to abort the already short traces. Maybe it has changed now. LJ is good at hoisting GC calls out of loops and such, but with the short callbacks there are none. One idea I had was to amalgamate all the Lua code that would run for each callback and then call the whole Lua code block at one, but it would be tricky to maintain module call order with mixed Lua and C modules without some control flow inversion (C modules called from Lua).

@vavrusam: what kind of traces are you referring to? Those you get from lua_error calls?

No, traces of compiled code. LJ is a trace compiler. It falls back to interpreter whenever it has to do stuff that breaks the rules, like GC, Lua/C, branches etc. Check out jit.p or this old blog post or how to profile compiled/interpreted code or Javier's last presentation.

added performance label

Let's try this in practice. We can give it a try when we have a big respdiff/drool setup.

changed milestone to %2018 Q3

marked as a Work In Progress

changed the description

removed milestone

assigned to @ljezek

@ljezek Please have a look once you have benchmarking infrastructure running. It will be necessary to also monitor memory usage so we can compare behavior before and after some of these changes.

added 3022 commits

f0ca89ac...9fa9df98 - 3021 commits from branch master
12e5606c - lua: stop trying to tweak lua's GC

Compare with previous version

Removing the lua_gc calls doesn't seem to have nay effect on the performance:

Admin message

lua: stop trying to tweak lua's GC

Merge request reports

Activity

Lua memory

C memory