Skip to content
Snippets Groups Projects

lua: stop trying to tweak lua's GC

Merged Vladimír Čunát requested to merge lua_gc into master

TL;DR: I believe all lua_gc() calls stemmed from misunderstanding lua documentation, and the current settings seem potentially dangerous.

First, let me rely on lua 5.1 docs, as luajit 2 is documented to have done only minor changes in the GC. http://www.lua.org/manual/5.1/manual.html#lua_gc http://wiki.luajit.org/New-Garbage-Collector#rationale

Commit 5a709411 claims to have increased the speed of GC to 400 % of speed of allocation, but LUA_GCSETSTEPMUL is the parameter that controls that, and that one was lowered to 99 % and later in 0ee2d1d7 even to 50 %. Documentation explicitly says that setting the value under 100 % may cause problems.

The default values seem perfectly sane to me and currently I can't see any particular reason to change them. It's 200 % relative GC speed, and waiting for allocated size to double before starting another cycle.

I assume the resulting possibility of GC being too slow caused the need to explicitly force a non-incremental GC cycle once in a while, but that seems not useful anymore and not good for latency.

Edited by Petr Špaček

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • I'm fairly confident on the point, but we might still want to wait for performance measurement before merging.

    /cc @vavrusam, just in case.

  • @jholusa: Could you do some measurements here? We might need more complicated scenario here (like with some policies) to exercise lua gc more. @vcunat Could you rebase on current master and propose a config to test for comparion of this and master?

  • At least the malloc_trim(0); that you removed is somewhat important if you expect it to release memory. Why did you do this change? There's a lot of assumptions. The main reason why you want to avoid GC runs in callbacks is that it aborts traces, that are already very short. Running GC periodically (adjusted by memory pressure) decouples it from input event rate, which is useful just like in game engines because it allows caller to say when it is safe/best to GC.

  • Contributor

    @ondrej ok, we will find out with Vlada how to test it.

  • I'll first explain myself in more detail. @vavrusam

    Lua memory

    Doing a non-incremental GC once in 100k requests seemed (i) a bit ad-hoc, as relation to the amount of memory allocation there is rather loose (depends on plugin setup), and (ii) a potential for a huge latency spike at that moment. The (ii) point depends on your preference, as non-incremental GC will likely have better throughput and maybe even better average latency.

    If I (heuristically) knew that now there's a bit of time when no requests will be processed and GC is likely to free nontrivial amount of memory, that would be a great moment to fire a full GC cycle, but once in 100k requests didn't seem to be a very good strategy.

    In any case, I certainly want to benchmark this before switching anything, e.g. to see how big influence on throughput can we expect.

    C memory

    C memory is released automatically. man malloc_trim:

    This function is automatically called by free(3) in certain circumstances [...]

    man mallopt provides more information about this, especially around M_TRIM_THRESHOLD. The strategy is configurable, and it seems to me that at least on Linux/glibc it's nothing trivial that I could beat it in a few workdays. In particular, I believe releasing all pages to the system isn't optimal, because memory usage fluctuates and getting/releasing pages isn't very cheap. IMO it's better to start by tweaking that settings instead of hand-managing it. So far I've seen no indication that the (glibc) default is somehow bad for our use case.

  • To do the lua GC more precisely, we could pause the incremental collector and plan our own "bigger incremental steps" – by following rise of LUA_GCCOUNT (the amount of lua memory) and using LUA_GCSTEP to perform steps of whatever size we choose. That would seem viable, e.g. every time a task finishes, re-evaluate the GC plan and sweep a suitable chunk of memory.

  • I think I'll leave this be for now, at least after 1.2.5 release, maybe even after 1.3.0. It feels lower-priority to me personally.

  • Sure, the memory management is not perfect and for example replacing malloc_trim() with M_TRIM_THRESHOLD would probably be better. I think it should happen in this change, so you can observe before/after. You can observe what happens when you load it with request and then stop. The default free strategy will almost never lower the rss, which is good for servers, but not on personal machines or routers where people still care about the memory footprint. Part of that is just marketing, like the jokes with Chrome consuming terabytes of RAM.

    The modules that run often, like policy, are now reworked to avoid allocations and the Lua/C inteface, so it won't probably matter that much. Last time I was working on it, it didn't create much garbage to begin with, but enough to abort the already short traces. Maybe it has changed now. LJ is good at hoisting GC calls out of loops and such, but with the short callbacks there are none. One idea I had was to amalgamate all the Lua code that would run for each callback and then call the whole Lua code block at one, but it would be tricky to maintain module call order with mixed Lua and C modules without some control flow inversion (C modules called from Lua).

  • @vavrusam: what kind of traces are you referring to? Those you get from lua_error calls?

  • No, traces of compiled code. LJ is a trace compiler. It falls back to interpreter whenever it has to do stuff that breaks the rules, like GC, Lua/C, branches etc. Check out jit.p or this old blog post or how to profile compiled/interpreted code or Javier's last presentation.

  • Let's try this in practice. We can give it a try when we have a big respdiff/drool setup.

  • Petr Špaček changed milestone to %2018 Q3

    changed milestone to %2018 Q3

  • Vladimír Čunát marked as a Work In Progress

    marked as a Work In Progress

  • Vladimír Čunát changed the description

    changed the description

  • Petr Špaček removed milestone

    removed milestone

  • assigned to @ljezek

  • @ljezek Please have a look once you have benchmarking infrastructure running. It will be necessary to also monitor memory usage so we can compare behavior before and after some of these changes.

  • Tomas Krizek added 3022 commits

    added 3022 commits

    Compare with previous version

  • Removing the lua_gc calls doesn't seem to have nay effect on the performance:

    response_rate.svg

  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
  • Loading
Please register or sign in to reply
Loading