Self sign-up has been disabled due to increased spam activity. If you want to get access, please send an email to a project owner (preferred) or at gitlab(at)nic(dot)cz. We apologize for the inconvenience.
I have tried that with downloading 700 MB file on MOX with 512 MB RAM. The php processes are no problem: they consume sane number of Megabytes in memory and php is also limited by php.ini configuration.
It looks like lighttpd tries to "cache" or pull the file in memory before sending it to http client.
Also note uploading the same file was no problem for the whole system and works like a charm.
This was here since the beginning. I think that this bug can not block another fixup release, which we need to release ASAP. We will need to priorize %Turris OS 5.1.2 for Sentinel fixes mostly in firewall. So this is going to be in one of upcoming fixup of Turris OS 5.1.
I believe that the bug you are seeing here was fixed in lighttpd 1.4.56. The bug was visible with FastCGI backends, large files, and (the default) server.stream-response-body = 0
I just tested HBL (future %Turris OS 5.2.0) and I still got issues with RAM while downloading large files from Next Cloud
I need to focus on the issue a bit more since I didn't recognized the process consuming the memory. There is some higher memory usage by myslqd and php which is needed by NextCloud, but these did not grow much during the file transfer...
With lighttpd 1.4.59 released as part of Turris OS 5.2.0, this issue ("lighttpd exhaust RAM while downloading big file") should be resolved.
I need to focus on the issue a bit more since I didn't recognized the process consuming the memory.
@vmyslivec could you post some of those details here? Is lighttpd memory growing and triggering OOM killer? That should not happen. If the issue is elsewhere, maybe a new issue with Nextcloud should be open to better describe where to look and how to reproduce. Thanks.
The problem is still present: While downloading a big file via NextCloud exhausts RAM completly and leads to OOM killer to kill a mysqld (MariaDB) process.
Current versions are:
Turris OS 5.2.2
nextcloud 19.0.3-3
lighttpd 1.4.59-1
php 7.2.34-3
mariadb-server 10.4.18-1
The difference between current and original state is that I don't see which process consumes the vast amount of memory now.
mysqld consumes a lot of memory, but constantly the same amount even via (obviously)
php processess consumes a little bit more memory during a file download, but not so significantly more than in "idle" state
lighttpd process consumes CPU time (obviously) but no extra memory time during the file tranfer
It's hard to debug as during the memory exhaust, everything is stuck and the MOX does not respond. htop command did not help much with locating the issue and it displays inconsistent data
We can probably create a new issue with and close this one as it seems lighttpd is not causing the issue anymore.
I also created #775 (closed) as I think the mysqld process is not tuned well according to memory consumption (especially on a 512MB MOX), but it is not probably the root cause of this issue.
PS: Again, uploading a file still works like a charm with no significant or unexpected load on the device.
OK, I found the root cause now! I realized that we use /tmp in RAM which is 222 MB in size in my case and the downloaded file size is 222 MB maximum.
I can confirm that during a file download lighthttpd creates temporary files within /tmp directory called lighttpd-upload-<random-letters>. Every files has the size of 1 MB and lightttpd creates up to circa 200 files that consumes (together with other temp files) 100 % of /tmp file system and thus about half of total RAM of my 512MB MOX. (That\s also why htop didn't revealed the cause of the RAM consumption.)
When the /tmp is exhausted, a browser offers the file to download but its size is only circa 220 MB (i.e. not the whole file!). Also, oom-killer comes to the scene about this time.
After several seconds, these temporary files disappear and the router comes back to normal. However, myslqd is killed, restarted and a downloaded file is not complete.
@gstrauss can we do something about that? Is it possible to configure lighttpd to recycle temporary files to not exhaust /tmp while downloading a file? Thanks for the reply
MOX would have an additional ~160 MB memory free if reForis and supporting services were not so bloated and always running on the system: #705 (closed) "lighttpd: reduce memory usage of foris and reforis" (160 MB is almost 1/3 of the entire 512 MB Mox memory!!!)
lighttpd supports server.max-request-size to limit the request size. (probably not applicable to this issue)
lighttpd supports streaming the request or response, and I mentioned this above 7 months ago: (#665 (comment 185700))
Those settings will reduce the amount of intermediate buffering performed by lighttpd. However, they also disable the default behavior of lighttpd, e.g. offloading the response as quickly as possible from the backend. With the above settings, the backend is now busy sending the response for (almost) as long as the client takes to download. This feature is more important when the backend is a heavy scripting language, such as PHP (nextcloud), running via CGI, where every request is an independent PHP process.
lighttpd supports X-Sendfile response header from backends, e.g. so that a PHP backend can tell lighttpd to directly read a file from the filesystem for the response, rather than PHP reading the file and copying to lighttpd, having lighttpd store the response in temp files, only to then send it to the client. It looks like NextCloud developers were unable to implement this cleanly in their system: https://github.com/nextcloud/server/issues/13082
Alternatively -- and possibly a better solution when discussing a system that has a secondary storage device (e.g. the nextcloud target volume) -- is to configure lighttpd server.upload-dirs to use a tmp/ directory on the persistent storage hosting nextcloud, e.g. /nextcloud/tmp
If the nextcloud feature is enabled in Mox, this could be set in e.g. /etc/lighttpd/conf.d/nextcloud.conf
server.upload-dirs := ( "/nextcloud/tmp" ) # or whatever appropriate path
lighttpd server.upload-dirs is a global setting in lighttpd and defaults to /var/tmp, since a sufficiently large storage location is intended. OpenWRT and Turris OS lighttpd.conf set upload-dirs = ( "/tmp" ) in /etc/lighttpd/lighttpd.conf. On Mox, /tmp is a limited in-memory filesystem.
server.upload-dirs supports multiple directories for tmp file creation (same directories for both upload and download) and if the first dir fills up (e.g. /dev/shm), then lighttpd will begin to use the next in the list (e.g. /var/tmp). This can work well on a system with an appropriately sized in-memory /dev/shm (default 1/2 memory) and a disk-backed /var/tmp.
For small-memory systems which support large uploads to persistent storage, it is often a better idea to set server.upload-dirs to a location on persistent storage, and to not use in-memory filesystems.
server.upload-temp-file-size controls the size of each tmp file, default 1 MB. The idea is that as soon as the tmp file is consumed (upstream or downstream), it can be removed to free up space, rather than always taking up the entire size of the request body or response body.
Aside: lighttpd mod_webdav is a not as featureful as NextCloud, but is much faster since lighttpd mod_webdav uploads files directly to the persistent storage location (instead of to temporary directories), and atomically renames the uploaded file into place.
Thanks for the comprehensive analysis and description @gstrauss. From my point of view, server.upload-dirs and server.stream-response-body are two possible solutions.
To take advantage of setting a different temporary upload dir, we must make sure it is on external storage (as we need to avoid excessive writes to the internal storage). This is something the storage plugin can take care about IMO.
Streaming the response/request is something that could work in the case the device lacks external storage.
To sum it up, we need to edit/update lighttpd configuration based on the state of the device/configuration. This should be handled by managing conf.d/ configuration snippets in certain TOS packages. What do you think @kkoci?
@kkoci please enable "Notifications" for me on this issue. For some reason, I do not have permission to do so myself, hence the reason I did not see @vmyslivec update 4 weeks ago, but I did get an email today when he referenced me @gstrauss
You are noted as participants. I do not have the right on configuring your account (and I am not sure if anyone has actually). The notification switch on the right panel has to be enabled to get mail. I suspect that you have it disabled and possibly unable to enable it? I can't change that. That might be Gitlab bug or something. I can report it to our admins but please check first what is the state of that button.
I noticed someone complains about the lack of notification from GitLab issues as well. I will try to figure it out and discuss it with GitLab administrators.
I suspect that you have it disabled and possibly unable to enable it? I can't change that. That might be Gitlab bug or something. I can report it to our admins but please check first what is the state of that button.
Yes, "Notifications" is disabled and the control is grayed-out. I do not have the ability to enable it for this issue. Also, since you did not mention me @gstrauss in your response, I did not get any notification of your response. Until this is sorted, please mention me @gstrauss in your posts if you would like to me to see your post in a timely fashion. Thank you.
From my point of view, server.upload-dirs and server.stream-response-body are two possible solutions.
To sum it up, we need to edit/update lighttpd configuration based on the state of the device/configuration. This should be handled by managing conf.d/ configuration snippets in certain TOS packages. What do you think @kkoci?
Please keep in mind that server.upload-dirs is a global setting. If external storage is available, it is desirable to use (globally) on the server instead of using internal storage (with more limited write cycles).
server.stream-response-body and server.stream-request-body can be configured with any lighttpd.conf condition, e.g. for any URL.
server.upload-dirs and server.stream-response-body and server.stream-request-body can be used together. They are not mutually exclusive.
FYI: I wrote some documentation on the lighttpd wiki which explains how to use server.upload-dirs, server.stream-response-body, and server.stream-request-body:
lighttpd resource tuning
@kkoci: In the interest of "a working solution now is better than a perfect solution in another year":
server.stream-response-body = 2 can be applied immediately. With server.stream-response-body = 2, everything should work and lighttpd will not fill up /tmp, which happens withoutserver.stream-response-body = 2.
It would be better if applied only to requests to NextCloud, e.g.
lighttpd resource tuning describes the behavior of server.stream-response-body = 2 in more detail.
A longer term solution would be a lighttpd include file if NextCloud is configured,
e.g. /etc/lighttpd/conf.d/nextcloud.conf
server.upload-dirs := ( "/nextcloud/tmp" ) # or whatever appropriate path to large persistent storage
or a similar configuration created and included by lighttpd if Turris OS configures an large external storage device and the user enables this external device by specifying a /tmp directory on the device with 1777 permissions. Creating and enabling such an include file for lighttpd.conf should be a secondary effect of some system-wide storage management solution so that it is clear that the external device is being designated for temp file use.
I can do both with ease as we provide that configuration file as part of our distribution.
Do you think that it is safe to enable server.stream-request-body and server.stream-response-body server wide to prevent issues with memory with any setup? I read through the document you linked and it seems to me that default on devices with low ram (in today's standards) should be 2. I also do not see a reason why it should not be set to 2 on low traffic sites (I can see issues if the site has high traffic). My understanding is that it can be selectively set to 0 for applications that we know can't trigger OOM this way.
Edit: Just to explain my thinking. The issue seems to be generic for any deployment of "upload/download" capable web on Turris. The solutions seem to be exclusive. It makes no sense to set upload-dirs when we are streaming data instead. Thus it seems to me that using the stream solution is easier and more generic. I am just not sure why are you suggesting the upload-dirs as the solution over streaming.
Do you think that it is safe to enable server.stream-request-body and server.stream-response-body server wide to prevent issues with memory with any setup?
Yes, it should be safe to do so.
Offloading requests and responses from backends will be reduced, which appears to be ok for the Turris environment. Also, mod_deflate will not operate on streaming responses, which is also likely acceptable for the Turris environment.
Below, I'll try to answer in more detail your question about why streaming is not enabled by default in lighttpd
I am just not sure why are you suggesting the upload-dirs as the solution over streaming.
I tried to describe in lighttpd resource tuning that there are tradeoffs between streaming and not streaming.
Disabling streaming (the default) allows lighttpd to offload requests and responses from backends, which is especially useful for low resource systems. lighttpd easily runs on routers with 64 MB of memory or even less. On memory-constrained systems, it is often desirable that CGI programs run for as short a time as possible. Too many CGI programs running in parallel might overload a small system. Too many CGI programs running in parallel might be avoided by not starting the CGI program until the entire request body has been received, and by reading the response body as quickly as possible from the CGI program, allowing the CGI program to finish and exit more quickly.
Independently, if a system supports large file uploads and downloads, that might suggest the presence of a large disk of persistent storage. If a large disk is present, then server.upload-dirs on the large disk should be considered, rather than using a very small in-memory filesystem for tempfiles. (The very small part is emphasized.)
I believe that server.upload-dirs on persistent storage is the better long-term solution for upload and download of large files.
Independently, enabling streaming for backends is recommended for large requests and responses for which offloading from the backend is a lower priority, or if full offloading might cause resource issues for the machine on which lighttpd is running.
Just for fun: computing resources have grown exponentially over the past few decades. Remember when 64 MB of RAM was a huge amount of memory? Another scenario (unlikely to affect users of Turris OS): for dumb clients, HTTP/1.1 streaming responses will send Transfer-Encoding: chunked. Without streaming responses, lighttpd is able to send Content-Length, even if the backend sent Transfer-Encoding: chunked to lighttpd. (Transfer-Encoding: chunked and Content-Length are both part of the HTTP/1.1 specification, but some dumb clients historically expected only Content-Length.)
BTW, if short-term changes are made to lighttpd config, I would suggest making those changes in turris-root.conf, and not in the main lighttpd.conf since (eventually?) someone might review the patches I proposed in #474 (closed) "lighttpd: Use upstream version instead of ours"
We try to use files in /etc/lighttpd/conf.d for every update because they are not marked as configuration files while the top-level lighttpd.conf is. Configuration files means that changes do not propagate automatically if the user modified the file.
lighttpd supports X-Sendfile response header from backends, e.g. so that a PHP backend can tell lighttpd to directly read a file from the filesystem for the response, rather than PHP reading the file and copying to lighttpd, having lighttpd store the response in temp files, only to then send it to the client. It looks like NextCloud developers were unable to implement this cleanly in their system: https://github.com/nextcloud/server/issues/13082
Back in 2019, one of the NextCloud developers had posted
It still this sounds like a nice feature, but the requests for this are quite low.
It might be nice if someone from the Turris team would like to post on behalf of the Turris organization. Having the Turris organization add a "this is useful to users running NextCloud on home routers" may help to get more support for adding the feature to NextCloud.
It might be nice if someone from the Turris team would like to post on behalf of the Turris organization. Having the Turris organization add a "this is useful to users running NextCloud on home routers" may help to get more support for adding the feature to NextCloud.
The RAM exhaustion should be resolved with foris-controller-storage-plugin new release (!816 (merged)). This makes it resolved for me.
What stays here is the request for support from @gstrauss on our behalf. Honestly, I do not see into it in such a way it would be beneficial that I would write there. I think that @mhrusecky should do it considering his history with Nextcloud project and community. Thus I am keeping this open and assigning it to @mhrusecky for doing that.
@kkoci: In an effort to reduce noise on the issue board, please go ahead and close this. I do not see a reason to keep this issue open for a request to comment on an external github issue.