Improve SSSD Performance with a timestamp cache¶
Related ticket(s):
Problem statement¶
At the moment SSSD doesn’t perform well in large environments. Most of the use-cases we’ve had reported revolved around logins of users who are members of large groups or a large amount of groups. Another reported use-case was the time it takes to resolve a large group.
While workarounds are available for some of the issues (such as using
ignore_group_members
for resolution of large groups), our goal is to
be able to perform well without these workarounds.
Use cases¶
- User who is a member of a large amount of AD groups logs in to a GNU/Linux server that is a member of the AD domain.
- User who is a member of a large amount of AD or IPA groups logs in to a GNU/Linux server that is a member of an IPA domain with a trust relationship to an AD domain
- Administrator of a GNU/Linux server runs “ls -l” in a directory where files are owned by a large group. An example would be group called “students” in an university setup
Overview of the solution¶
During performance analysis with systemtap, we found out that the
biggest delay happens when SSSD writes an entry to the cache, especially
for large group entries. This is also confirmed by empirical evidence
from our users, where most deployments were OK with SSSD performance
once the cache was moved to tmpfs or even when ignore_group_members
option was enabled.
We can’t skip cache writes completely, even if no attributes changed, because we store also the expiration timestamps in the cache. Also, even if a single attribute (like the timestamp) changes, ldb would need to unpack the whole entry, change the record, pack it back and then write the whole blob.
In order to mitigate the costly cache writes, we should avoid writing the whole cache entry on every cache update, but only write the entries if something actually changed.
To avoid this, we will split the monolithic ldb cache representing the
sysdb cache into two ldb files. One would contain the entry itself and
would be fully synchronous. The other (new one) would only contain the
timestamps and would be open using the LDB_FLG_NOSYNC
to avoid
synchronous cache writes.
This would have two advantages:
- If we detect that the entry hasn’t changed on the LDAP server at all,
we could avoid writing into the main ldb cache which would still be
costly. We would use the value of the
modifyTimestamp
attribute of the LDAP entry to see if the entry had changed or not. - The writes to the new async ldb cache would be much faster, because
the entry is smaller and because the writes wouldn’t call
fsync()
due to using the async flag, but rather rely on the underlying filesystem to sync the data to the disk.
On SSSD shutdown, we would write a canary to both the timestamp cache and the main sysdb cache, denoting graceful shutdown. On SSSD startup, if the canary wasn’t found or if the values differ, we would just ditch the timestamp cache, which would result in refresh and write of the entry on the next lookup.
The basic idea is to use a combination of the operational
modifyTimestamp
attribute and checking the entry itself to see if
the entry changed at all and if not, avoid writing to the cache.
Checking the value of modifyTimestamp
would be enough for group
entries, which should be the first iteration of this work. For checking
if other entries (mostly users) have changed, we need to compare the
value of the attributes in the cache with what we are about to store in
the cache.
Therefore, these enhancements are proposed for the 1.14 versions, sorted by the importance as observed with systemtap testing:
- only write the cache entry if the
modifyTimestamp
of the original entry had changed. If it hasn’t changed, only the timestamps would be written to the timestamp cache - if the
modifyTimestamp
had changed, compare the attributes of the cache entry with the attributes we are about to write. If there are no differences, only write to the timestamp cache - refactor the nested group processing to make sure expensive lookups (such as lookups of all members of the group, there can potentially be thousands of these) are only performed once and intermediate results are stored in-memory.
- attempt to shortcut parsing the attributes of the entry returned from
LDAP sooner. The idea behind this is that if the
modifyTimestamp
did not change, we can reuse the entry we already cached.
Minor enhancements in later versions might include:
- using syncrepl in the server mode for HBAC rules and external groups in refreshAndPersistMode. This would provide performance benefit for legacy clients that rely on server’s HBAC rules for access control.
- using syncrepl in the server mode for external groups in refreshAndPersistMode. This would mainly simplify the external groups handling, rather than improve performance
- A lot of time is spent looking up attributes in the
sysdb_attrs
array. This is something we might want to optimize after we’re done with the cache writes. - We might even consider offering syncrepl in refreshOnly mode as an client-side option for enumeration. However, this would have to be an opt-in because every refresh causes the server to walk the changelog since the last refresh operation. Enabling this option on all clients would trash the server performance.
Implementation details¶
The sysdb_ctx
already contains a handle of the main sysdb cache. We
would add another ldb file that only contains the timestamp and the DN
of an entry. This ldb file would be opened in the nosync mode.
Attributes used for lookups, like dataExpireTimestamp
must be
indexed in this database as well.
When storing a user or a group to sysdb using functions like
sysdb_store_user
, we first check the difference between
modifyTimestamp
attributes. If there are no differences, only the
timestamp attributes, such as lastUpdate
or dataExpireTimestamp
would be updated in the timestamp cache. We need to do this check in the
lower-level sysdb calls to make sure this enhancement also works for
users and groups retrieved through the extop plugin.
If the value of modifyTimestamp
differs, we proceed to checking the
diff between values in the cache and the values read from LDAP.
Details about shortcut of attribute parsing will be added to this design page later.
Configuration changes¶
Currently no configuration changes are expected. We might add some if we decide to implement on-demand syncrepl.
How To Test¶
If the entries on the server did not change (except timestamps), then actions like user and group lookups and logins should be considerably faster.
The SSSD should also correctly detect when the entries in fact did change on the server. In this case, a full cache write will be performed.
Authors¶
- Jakub Hrozek <jhrozek@redhat.com> with the kind help of
- Sumit Bose <sbose@redhat.com>
- Ludwig Krispenz <lkrispen@redhat.com>
- Simo Sorce <simo@redhat.com>