“Files” data provider to serve contents of /etc/passwd and /etc/group

Related ticket(s):

which includes the following sub-tasks:

Problem statement

SSSD does not behave well with nscd, so we recommend that it be disabled. However, this comes with a price in the form of every nameservice lookup hitting the disk for /etc/passwd and friends every time. SSSD should be able to read and monitor these files and serve them from its cache, allowing sss to sort before files in /etc/nsswitch.conf

In addition, SSSD provides some useful interfaces, such as the dbus interface which only work for users and groups SSSD knows about.

Use cases

Use Case: Default Configuration

SSSD (and its useful APIs) should always be available. This means that SSSD must ship with a default configuration that works (and requires no manual configuration or joining a domain). This default configuration should provide a fast in-memory cache for all user and group information that SSSD can support, including those traditionally stored in /etc/passwd and friends.

Use Case: Programatically managing POSIX attributes of a user or a group

Currently the available ways to manage users and groups is either spawn and call shadow-utils binaries like useradd or libuser. SSSD already has a D-Bus API used to provide custom attributes of domain users. This interface should be be extended to provide ‘writable’ methods to manage users and groups from files. This is tracked by ticket #3242

Use Case: Manage extended attributes of users and groups

Some applications (such as desktop environments) additional attributes (such as keyboard layout) should be stored along with the user. Since the passwd file has only a fixed number of fields, it might make sense to allow additional attributes to be stored in SSSD database and retrieved with sssd’s D-Bus interface. Again, this is tracked by ticket #3242

Overview of the solution

SSSD should ship a files provider as part of its required minimal package. Absent any user modifications, SSSD should be configured to start at boot and use this provider to serve local identity information.

This provider may or may not be optional. For example, we might decide that it always exists as the first domain in the list, even if not explicitly specified. Alternatively, distributions that wish to always include the files provider will be able (starting with SSSD 1.14 and its config merging feature to drop a definition of the files provider into /etc/sssd/conf.d. In order for this functionality to work, we would have to deprecate the domains line and instead load all [domain/XXXX] sections from all available sources, unless the domains line is specified for backwards-compatibility.

Implementation details

Upon SSSD startup, the files provider will always run a complete enumeration pass on the /etc/passwd, /etc/group and other files as appropriate. The provider will then configure an appropriate set of file monitors (using inotify()) and will re-run the enumeration if any of those files are modified or replaced. The implementation of enumeration would use the nss_files module interface - we would dlopen the module and dlsym the appropriate functions like __nss_files_getpwent.

The fast-cache must also be flushed any time the enumeration is run, to ensure that stale data is cleaned up. We should also consider turning off the fast memory cache while we are performing the update.

In addition, the nscd cache (if applicable) should also be flushed during an update. The updates to the files should be sufficiently rare so the performance impact would be negligible.

The files provider in its first incarnation is expected to be a read-only tool, making no direct modifications to local passwords. In future enhancements, the Infopipe may grow the capability to serve the AccountsServices API and make changes.

When a change in the files is detected, we should also flush the negative cache - either only the changes or just flush it whole. This would prevent scenarios like:

getent passwd foo # see that there is no user foo
useradd foo       # OK, let's add it then
getent passwd foo # still no user returned until the negative cache expires

from confusing admins.

Configuration changes

We may need the ability to choose non-default locations for files. This can be a hidden (undocumented) option in the first version and if there is a need to actually configure a non-default location, we can later expose these configuration options.

We may also need to set a configurable number of seconds between detecting a change and running enumerations. This could be implemented in waiting a short time (2-3 seconds perhaps?) before detecting the change and running the enumeration to avoid excessive enumerations and invalidating the fastacache during subsequent shadow-utils invocations.

Performance impact

For measuring performance impact, we have developed a simple project called nssbench which measures the time spent in NSS with systemtap. For each case, results are included for a single lookup which simulate the simplest case of an application that is spawned and exists and a case where an application performs several lookup and is able to benefit from the memory cache which is opened once per application. For single lookups, we ran the tests 10 times and averaged the Below are test results from different scenarios:

  1. Base-line: Looking up a local user directly from nss_files

    • Single lookup

      nss operation getpwnam(jhrozek) took 226 us
      _nss_files_getpwnam cnt:1 avg:30 min:30 max:30 sum:30 us
      _nss_sss_getpwnam cnt:0 avg:0 min:0 max:0 sum:0 us
      
    • 100 lookups

      nss operation getpwnam(jhrozek) took 2717 us
      _nss_files_getpwnam cnt:100 avg:21 min:14 max:524 sum:2159 us
      _nss_sss_getpwnam cnt:0 avg:0 min:0 max:0 sum:0 us
      
  2. Failover from sss to files when SSSD is not running - this is the ‘worst’ case where sss is enabled in nsswitch.conf but the daemon is not running at all, so the system falls back from sss to files for user lookups.

    • Single lookup

      nss operation getpwnam(jhrozek) took 549 us
      _nss_files_getpwnam cnt:1 avg:32 min:32 max:32 sum:32 us
      _nss_sss_getpwnam cnt:1 avg:72 min:72 max:72 sum:72 us
      
    • 100 lookups

      nss operation getpwnam(jhrozek) took 6078 us
      _nss_files_getpwnam cnt:100 avg:19 min:16 max:42 sum:1907 us
      _nss_sss_getpwnam cnt:100 avg:22 min:19 max:74 sum:2248 us
      
  3. Round-trip between SSSD daemon’s populated cache and OS when the memory cache is not used or not populated

    • Single lookup

      nss operation getpwnam(jhrozek) took 755 us
      _nss_files_getpwnam cnt:0 avg:0 min:0 max:0 sum:0 us
      _nss_sss_getpwnam cnt:1 avg:384 min:384 max:384 sum:384 us
      
    • 100 lookups

      nss operation getpwnam(jhrozek) took 97831 us
      _nss_files_getpwnam cnt:0 avg:0 min:0 max:0 sum:0 us
      _nss_sss_getpwnam cnt:100 avg:968 min:115 max:22153 sum:96812 us
      
  4. Performance benefit from using the memory cache

    • Single lookup

      nss operation getpwnam(jhrozek) took 373 us
      _nss_files_getpwnam cnt:0 avg:0 min:0 max:0 sum:0 us
      _nss_sss_getpwnam cnt:1 avg:37 min:37 max:37 sum:37 us
      
    • 100 lookups

      nss operation getpwnam(jhrozek) took 1355 us
      _nss_files_getpwnam cnt:0 avg:0 min:0 max:0 sum:0 us
      _nss_sss_getpwnam cnt:100 avg:4 min:3 max:42 sum:408 us
      

The testing shows substantial benefit from SSSD cache for applications that perform several lookup. The first lookup, which opens the memory cache file takes about as much time as lookup against files. However, subsequent lookups are almost an order of magnitude faster.

For setups that do not run SSSD by default, there is a performance hit by failover from sss to files. During testing, the failover took up to 300us, about ~70us was spent in the sss module and about ~200 us seems to be the failover in libc itself.

Compatibility issues

Unless the ordering is specified, the files provider should be loaded first.

Other distributions should be involved as well - we should work with Ubuntu as well.

abrt and coredumpd must be run with SSS_LOOPS=no in order to avoid looping when analyzing a crash. We need to test this by reverting the order of modules, attaching a debugger and crashing SSSD on purpose.

Packaging issues

We need to add conflicts between glibc an an sssd version that doesn’t provide the files provider.

How To Test

When properly configured, SSSD should be able to serve local users and groups. Testing this could be as simple as

getent -s sss passwd localuser

Of course, testing on the distribution level could be more involved. For the first phase, of just adding the files provider, nothing should break and the only thing the user should notice is improved performance. Corner cases like running sssd_nss under gdb or corefile generation with setup where sss is set first in nsswitch.conf must be done as well.

How To Debug

A simple way of checking is some issue is caused by this new setup is to revert the order of NSS modules back to read files sss.

Authors