Simplify use of Linux capabilities?

Hi,

I've reviewed Knot DNS's use of Linux capabilities, and I found it somewhat confusing. I'll make a suggestion to simplify this code in a follow-up comment.

In the configuration reference, the only mention of capabilities is in the description of the server: user config variable:

A system user with an optional system group (user:group) under which the server is run after starting and binding to interfaces. Linux capabilities are employed if supported.

This sort of implies on first glance that capabilities are only used if server: user is set, but actually capabilities are used for much more than changing user/group and knotd makes calls to change capabilities regardless of whether server: user is set in the config.

In the README, the libcap-ng library is mentioned, but only as an optional dependency. Capabilities are only used if this dependency is present at build time, otherwise Knot behaves like a traditional uid/gid privilege dropping daemon. When libcap-ng support is compiled in, here is how knotd behaves:

First, knotd needs to be started as root with traditional privileges, or as a non-root user that has had its capabilities elevated (e.g. with systemd's User=, Group=, CapabilityBoundingSet= and AmbientCapabilities= options).
When knotd is started, the main thread calls setup_capabilities() at an early part of the daemon startup, in particular before any logging is setup, configuration is loaded, sockets are created, etc. This function has a comment that it should "Drop all capabilities" but in fact it doesn't. Instead, if the process has the CAP_SETPCAP capability, it drops most of its capabilities, but retains the following:

CAP_SETPCAP
CAP_DAC_OVERRIDE
CAP_CHOWN
CAP_NET_BIND_SERVICE
CAP_SETUID
CAP_SETGID
CAP_SYS_NICE

This is still quite a lot of privileges that traditionally only the root user should possess, so my assumption is that this code was written assuming that knotd was being started as the unconstrained root user (thus knotd is giving up a bunch of capabilities that it doesn't need), rather than as a non-root user that has been given elevated capabilities (because a sysadmin or packager setting up a constrained environment for knotd would realize that a non-root process with these capabilities is a quite privileged process and wonder what is going on).

Once knotd has given up some of its capabilities but retained those listed above, it proceeds with the rest of the daemon startup. In particular, sockets are bound, uid/gid privileges are dropped, and threads are started.

The uid/gid privilege dropping in proc_update_privileges() appears to only be performed if server: user is set in the config to a different user/group than the one that starts the knotd process. (That's another thing that makes me think this code assumes a root → non-root transition; I can't think of a daemon that does setuid/setgid from a non-root user with the CAP_SETUID + CAP_SETGID capabilities.)

Assuming that knotd changed uid/gid to a non-root user, it should have now lost its remaining capabilities automatically. See "Effect of user ID changes on capabilities" in the capabilities(7) manpage:

If one or more of the real, effective or saved set user IDs was previously 0, and as a result of the UID changes all of these IDs have a nonzero value, then all capabilities are cleared from the permitted and effective capability sets.

If the effective user ID is changed from 0 to nonzero, then all capabilities are cleared from the effective set.

Here are the confusing parts, though.

In the thread entry point function thread_ep() which is called when a new thread is created, the following code is executed:

	/* Drop capabilities except FS access. */
#ifdef HAVE_CAP_NG_H
	if (capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) {
		capng_type_t tp = CAPNG_EFFECTIVE|CAPNG_PERMITTED;
		capng_clear(CAPNG_SELECT_BOTH);
		capng_update(CAPNG_ADD, tp, CAP_DAC_OVERRIDE);
		capng_apply(CAPNG_SELECT_BOTH);
	}
#endif /* HAVE_CAP_NG_H */

This seems weird, because we should have lost the CAP_SETPCAP capability if we transitioned from root to a non-root user. That is, I think this code only runs if knotd was started as a non-root user with at least the CAP_SETPCAP privilege. But that seems unlikely (see parenthetical remarks above). It's also not clear to me why the non-main threads started by knotd need to retain the CAP_DAC_OVERRIDE capability, which is traditionally a root privilege. (I would think that knotd should set up any permissions it needs to operate correctly before dropping privileges.)

In the function udp_master() which is called for UDP worker threads, the following code is executed:

	/* Drop all capabilities on all workers. */
#ifdef HAVE_CAP_NG_H
        if (capng_have_capability(CAPNG_EFFECTIVE, CAP_SETPCAP)) {
                capng_clear(CAPNG_SELECT_BOTH);
                capng_apply(CAPNG_SELECT_BOTH);
        }
#endif /* HAVE_CAP_NG_H */

This would drop all capabilities according to the example given on https://people.redhat.com/sgrubb/libcap-ng/, but it only occurs if we have the CAP_SETPCAP capability, which we would have already lost (along with all other capabilities) if knotd dropped from root to non-root. So this code seems redundant, or at least it will only execute if knotd was started as a non-root user with the CAP_SETPCAP privilege.

The function tcp_master() doesn't have any calls to libcap-ng, but it does have this suggestive, unused #include:

#ifdef HAVE_CAP_NG_H
#include <cap-ng.h>
#endif /* HAVE_CAP_NG_H */

It's not clear to me why setup_capabilities() wants to retain the CAP_SYS_NICE capability. This capability allows changing niceness, scheduling/affinity properties, etc. for arbitrary processes. I see some calls to pthread_setaffinity_np() and a commented out call to pthread_attr_setschedpolicy() in the code base, but it seems surprising that CAP_SYS_NICE would actually be required.

pthread_setaffinity_np() is implemented with sched_setaffinity(2), and according to its manpage CAP_SYS_NICE is only required if the caller and the target thread are running as different users:

The caller needs an effective user ID equal to the real user ID or effective user ID of the thread identified by pid, or it must possess the CAP_SYS_NICE capability in the user namespace of the thread pid.

Admin message

Admin message

Simplify use of Linux capabilities?