FastNSSCache

Problem Statement

Currently in SSSD user and group lookups are performed by using a connection to a named socket. This means each client has to talk to the sssd_nss daemon for every lookup it needs to perform.

Although this behaviour works well for normal machines, it has scalability limits on busy machines with many processes that need to query the user/group database frequently.

There are 2 factors that cause scalability issues:

  1. context switches
  2. the responder process is single-threaded (although asynchronous) so the amount of processing it can do is limited by the speed of 1 cpu

Each request suffers from at least 2 context switches (and a few copies of the data in memory) to write() the request in, wait until it is processed by sssd_nss, read() the reply back. Because sssd_nss may be busy answering many requests a queue may build up and replies be delayed.

Allowing the clients to directly access SSSD caches is not possible for various reasons including:

  • sssd uses LDB as caching backend and LDB depends on byte range locks. Giving a client read access to the cache would allow DoS, f.e. the client locks a record and never unlocks it.
  • sssd stores data not all clients are allowed to get access to (password hashes for example) and partitioning access to this data within the LDB cache is not feasible.

A method to avoid context switches and the sssd_nss bottleneck without compromising th security of the system is therefore desirable.

Overview of FastNSSCache solution

The FastNSSCache feature addresses both issues.

This is done by creating a specialized cache that have a few properties

  • Contains only public data (the same data available in a public passwd or group files)
  • Read only for clients
  • Does not use locking and yet prevents access to inconsistent data
  • Cache has a fixed size and uses a FIFO (for now) method to know which entries to purge
  • Fallback to named sockets if entry is not found in the Fast Cache

Implementation details

The cache files are opened on the client at the first query and mmaped in the process memory, all accesses to data are therefore direct access to memory and do not suffer from any context switch. They also happen in parallel within each process with synchronization (in order to allow updates) performed by using memory barriers.

Cache files can only be used for direct lookups (by name or by UID/GID), enumerations are _never_ handled via fast cache lookups by design, they always fallback to socket communication.

The “maps” currently available are the _passwd_ and _group_ maps, each map has a file associated in the /var/lib/sss/mc directory which is accessible read-only by clients.

Configuration

At the moment we plan to provide 3 parameters per map that can control the caches.

  • Per map enablement parameter that allows to activate/deactivate maps individually.
  • Per map cache size to fine tune the cache sizes in case space is at a premium or the dataset does not fit the default cache.
  • Expiration time for entries.

Cache entries warrant a shorter expiration time than current LDB caches because access to these entries is undetectable by sssd_nss which cannot decide how much an entry is required and whether a midway refresh is needed. By shortening the FastNSSCache entries life time we incur in the penalty of using the pipe from time to time but in turn we allow ssd_nss to decide whether it is required to refresh the entry or not.

Future Improvements

  • Better garbage collection on the server side, at the moment a FIFO is used.
  • Handle caching other nsswitch.conf database plugins in order to avoid slow access to the files db