Using the Global Catalog to speed up lookups by ID¶
Problem statement¶
When SSSD is connected to a forest with multiple domains, each lookup, unless qualified with the domain name, iterates over all the domains. Moreover, some lookups, such as by-ID cannot be qualified using the NSS interface at all.
This means the SSSD will issue N LDAP searches for N domains. If
the object SSSD is searching for exists in the LDAP database in one of the
domains, the performance impact can be mitigated with the already existing
option cache_first
, which will, even for non-qualified searches, first
check if the requested object exists in the local database and if it does,
searches the corresponding domain only.
But this option doesn’t solve the problem of looking for objects, especially numerical IDs, that do not exist in the remote database at all. A search for such non-existent object will always traverse all the domains every time the negative cache from a previous request expires.
In environments that use the Global Catalog, this issue can be mitigated by locating the object’s domain in the Global Catalog, provided that the search key is present in the Global Catalog in the first place.
Use-cases¶
Currently the primary use-case is SSSD joined to an AD forest consisting of
multiple domains and configured with id_provider=ad
, because only the AD
provider supports Global Catalog lookups. There are some plans to implement
the Global Catalog e.g. for FreeIPA, but so far no implementation exists.
At the same time, only environment that use POSIX UID and GID attributes set by the administrator will benefit from this enhancement, becase if the client maps the IDs algorithmically from the SIDs, the AD provider is already able to shortcut the by-ID request after computing the SID from the requested ID and realizing that the domain SID does not come from the current domain.
The current state of Global Catalog support in SSSD¶
The Global Catalog is an LDAP database, which contains a subset of attributes
about objects from all the domains in the whole forest. What attributes
are replicated to the Global Catalog is defined by the Partial Attribute Set.
It is possible to query for the attributes
that are replicated to the Global Catalog using an LDAP query based in
the cn=schema,cn=configuration
subtree and check for the presence of
isMemberOfPartialAttributeSet=TRUE
, for example:
ldapsearch -Y GSSAPI \
-H ldap://dc.win.trust.test:389 \
-b cn=schema,cn=configuration,dc=win,dc=trust,dc=test \
'(&(objectClass=attributeSchema)(isMemberOfPartialAttributeSet=TRUE))'
It is important to note that because the POSIX attributes such as
uidNumber
or gidNumber
are neither part of the default Active
Directory schema, nor replicated to the Global Catalog by default.
To learn how to extend the schema to set the POSIX attributes at all,
follow the Install Identity Management for UNIX Components
article on the Microsoft TechNet site. How to extend the Partial Attribute Set
is described for example in the AD DS: Global Catalogs and the Partial Attribute Set
TechNet blog post.
The purpose of using the Global Catalog in SSSD is two-fold:
- to avoid having to connect to the LDAP server of a DC from every domain in the forest
- to look up the cross-domain members of Universal Groups, which are only present in the Global Catalog
Because not all the attributes required by SSSD are guaranteed to be
replicated to the Global Catalog (especially the uidNumber
and
gidNumber
attributes), SSSD runs a search that checks for
the presence of any objects with either uidNumber
or gidNumber
during the very first request for a numerical ID. If no objects with
either attribute are present, the Global Catalog support is disabled
except for looking up Universal Group members.
However, at the moment, SSSD will either use whole entry it finds in the Global Catalog or not use the Global Catalog at all. This puts a bit of responsibility on the administrator in the sense that the object in the Global Catalog must contain all the required entries or the administrator might need to disable the Global Catalog support manually in the configuration file. In the future (see e.g. ticket 3538 RFE: Use the global catalog only to look up the entry DN) we would like to change the logic so that it uses the Global Catalog to look up the entry DN, but then it would look up the entry attributes in the LDAP directory of the object’s domain. However, that enhancement is out of scope of what this design page describes.
Overview of the solution¶
A new Data Provider method getAccountDomain()
whose purpose is to locate
a domain an object resides in will be added. At the moment, only the AD
provider will implement this handler.
The responder’s cache_req
module will call this handler before iterating
over domains. For all domains except the one returned from the handler,
the cache_req
module will set the requested object into negative cache.
This would cause the subsequent loops over the domains to just skip the
domains where the entry was not found and only look up the entry in the
domain that the getAccountDomain()
method returned.
Implementation details¶
There are two parts to the implementation - the responder side, which mostly
touches the cache_req
code and the provider side. The responder side
would also require adding some API to the negative cache module.
Responder changes - cache_req and negative cache¶
On the responder side, the ability to locate a domain of a requested object
will be provided by new cache_req
plugin methods. Not all plugins will
be augmented with the methods that call the domain locator - at least in
the first iteration, only the plugins that search objects by ID will use
the new Data Provider API.
When looking up an entry, the cache_req
request must first decide
whether it is worth calling the domain locator request at all. The locator
request should only be called when there are multiple domains to search
and the request is not already qualified with a domain name. Similarly,
the domain locator should not be called if the request is only evaluating
the cached data (bypass_dp=True
, which is typically set during the
first pass when the cache_first
option is enabled). Of course, the
locator would also only be called for plugins that implement the associated
methods.
When all the above evaluates into calling the locator (e.g. searching a user UID while multiple domains are defined), the first step before actually calling the locator DP method should still be looking into the cache. This additional step ensures that looking up an ID from the first defined domain in a setup with many domains wouldn’t needlessly hit the Global Catalog, while the entry is still cached in sysdb.
Finally, the responder would call the getAccountDomain
Data Provider
method. If calling the DP method returns an error, this error is in no way
fatal, but instead, the cache_req
code resumes the original codepath
where all domains are searched sequentially. One error code that signifies
that the back end as a whole doesn’t support locating ID’s domain must be
added. When the cache_req
code would receive this error code, it
would never call the domain locator again for this domain.
On returning success from the getAccountDomain
method, the string
returned from the method will contain the domain where the ID was found.
Only one domain can be returned, conflicting values in the ID space will
be detected on the provider side and handled by returning an error, which
will fall back to the sequential lookups.
The returned domain name will be used to set a negative cache entry for
the looked up object in all domains except the one that was returned.
It is important to only mark (sub)domains that belong to the same “main”
domain with these negative cache entries, especially because internally
in the cache_req
code, we use a flattened domain list to iterate over
in order to support custom domain lookup priorities. After this is done,
the cache_req
code would loop back into its original logic, but the
negative cache entries will ensure that domains that do not contain this
ID are skipped.
Because the loop over domains is resumed only after the locator was called,
there needs to be a way to avoid calling the locator too often. To this end,
a new negative cache container would be added. Under this container, we will
store the values of the objects we look up to notify the cache_req
code
that either the locator must be called again or that calling the locator
can be skipped this time and the per-domain-per-ID negative cache entries
can be reused again during the loop over domains.
Provider changes - the getAccountDomain
implementation¶
All providers except id_provider=ad
will set a dummy getAccountDomain
handler which always returns ERR_GET_ACCT_DOM_NOT_SUPPORTED
. Therefore,
for all domains except the ones with the AD provider, the
getAccountDomain
method will only be called once and then disabled.
The AD provider implementation of the getAccountDomain
method will
search the Global Catalog with an empty search base, thus searching across
all the domains in the forest. Two details are important to bring up with
respect to this search:
- In order for this lookup to be useful even for non-existant IDs, the Global Catalog search must be “authoritative”. In other words, not finding the entry in the Global Catalog must be considered as if the entry doesn’t exist.
- Because the POSIX IDs are not replicated by default to the Global Catalog, the
getAccountDomain
request must check if any POSIX IDs at all are replicated to the Global Catalog at all.
Configuration changes¶
None. However, it should be noted that disabling the Global Catalog support
as a whole in SSSD would disable the getAccountDomain
in the sense that
it would always return ERR_GET_ACCT_DOM_NOT_SUPPORTED
which would in turn
instruct the responder to never call the getAccountDomain
request again
Therefore, disabling the Global Catalog can be used to disable this new functionality.
How To Test¶
To test the functionality itself, an AD forest with multiple domains should be used. Please make sure the POSIX attributes are present and replicated to the Global Catalog. Requesting a POSIX ID from domain outside the joined one should first consult the Global Catalog and then proceed to only searching the individual domain where the ID was located.
It is important to test that there are no regressions in setups that either
do not use POSIX IDs at all or do not replicate the POSIX IDs to the Global
Catalog. In these setups, as well as configurations that use a different ID
provider, the cache_req
code must only attempt to call the locator once.
Similarly, setups that use multiple domains (and remember that since
Fedora-26, all SSSD installations automatically enable the files
provider) must see no regressions.
Authors¶
- Jakub Hrozek
<jhrozek@redhat.com>