Multiple server addresses or names in kdcinfo files¶
Problem statement¶
When a user authenticates using Kerberos, the KDCs that will actually be
used are either discovered by libkrb5 with the help of DNS SRV records,
or the KDCs are configured explicitly in /etc/krb5.conf.
or provided
by a special locator plugin.
Because the administrator expects that the servers they defined in
sssd.conf
would be used for both authentication through SSSD and by
applications that use libkrb5, such the Kerberos command line tools like
kinit
, SSSD provides a locator plugin for libkrb5 that allows SSSD to
inform libkrb5 about the servers SSSD had configured.
However, SSSD, at least in the typical use case, only writes the information about the single server it connects to and changes the address only when the daemon reconnects to a different server. This creates a problem in case the server whose address is written in the kdcinfo file is unreachable but no action towards sssd that would provoke a fail over (such as a user login over PAM) is executed. In that case, the kdcinfo file contains stale entries and because from libkrb5 point of view, the kdcinfo files are authoritative and if the information present there is not useful, libkrb5 cannot reach any KDCs from that domain.
To improve the situation, this design page proposes adding a new sssd option that, if set, would enable sssd to write additional host names into the kdcinfo files which would then allow the plugin to iterate over these items and in turn allow libkrb5 to have sort of a failover for entries configured in sssd.conf or autodiscovered by SSSD.
Use cases¶
- A typical sequence that triggers this problem is this:
- log in with a PAM service to a machine. This causes a KDC address to be written to the kdcinfo file
- disable the KDC server, e.g. by enabling a restrictive firewall rule
- call kinit on the client where the kdcinfo file was written
Overview of the solution¶
The Kerberos locator plugin reads the address(es) from per-realm text files
written by SSSD located in the /var/lib/sss/pubconf
directory. At the
moment, the plugin can already read multiple entries, but currently only
numerical addresses are supported.
- On a high level, implementing this RFE requires several changes:
- change the Kerberos locator plugin so that it can also consume host names in addition to numerical addresses. These host names would be resolved in the plugin itself and passed to libkrb5 with the help of a callback function libkrb5 provides to the plugin
- add a new SSSD option that would limit the number of entries that SSSD writes to the kdcinfo plugin. This is needed to avoid time outs in case the network was truly unreachable. The default value of the option could perhaps be different in master and sssd-1-16 where master could default to writing multiple entries, but sssd-1-16 would default the option to 0 in order to not change behaviour of a stable branch.
- extend the online callback which the SSSD fail over component uses to write the current server to the kdcinfo files to also write additional server host names in addition to the current server address
- to enable writing multiple server addresses, the request to resolve a server for a service should be extended to resolve host names up to the specified limit
When it comes to resolving the servers, there are several scenarios to consider:
- The servers can be enumerated using an option. This includes
krb5_server/krb5_backup_server
for the krb5 provider andipa_server/ipa_backup_server
andad_server/ad_backup_server
for the IPA and AD providers.- The servers can be completely autodiscovered. Typically this is done by either omitting the
*_server
options completely or using the_srv_
identifier. As long as the list is omitted or the_srv_
record is the first one in the list, any fail over service resolution would trigger the DNS SRV lookups and resolve the whole list. It is useful to note that the_srv_
identifier is not permitted in the backup server list explicitly, but the AD provider does resolve a SRV query into the backup server list. That is done in case an AD site is used, then the servers from the AD site are added as ‘primary’ and the global servers form the ‘backup’ list.- A mix of the above. The most complex case from the point of this RFE is a list that starts with a host name, but includes the
_srv_
identifier later on, e.g.krb5_server = kdc.example.com, _srv_
. In this case, currently calling the fail over resolution would only resolve the host name ofkdc.example.com
, but not the SRV query, so unless the fail over code is extended, the host names originating from the SRV query would not be known after the service resolution finishes.
Implementation details¶
The interface the locator plugin uses to communicate with libkrb5 is a
callback function provided by the caller (libkrb5), SSSD is supposed
to pass a struct sockaddr to the caller. The Kerberos locator plugin
is already capable of iterating over multiple addresses, but currently
really only numerical addresses are supported and the plugin converts
the string representation of the address into struct sockaddr by calling
getaddrinfo(3)
with the AI_NUMERICHOST
parameter. We should extend
the locator plugin code by calling getaddrinfo for entries that do not
represent an address to resolve a host name and pass its address. This
can be a first self-contained step in the implementation.
The kdcinfo files are written (using write_krb5info_file
) either
during an online callback or in a special-case for IPA trust clients. The
special case is already doing something similar to what this page
is about by looking into a subsection representing a trusted domain
(e.g. [domain/ipa.test/win.trust.test]
) and resolving all the servers
in that list either by name or based on a site selection. However, this
is done during the subdomain provider operation, not during a resolver
callback and all the addresses configured in the sssd.conf
file are
always resolved and written to the config file.
The write_krb5info_file
receives a linked list of struct fo_server
structures which contains the address, if already resolved, or at least
a host name in the struct server_common
member structure. Since the
callback should already be synchronous and not do much work on its own, it
would be best if the callback was already invoked with the data provided,
There are two kinds of servers in the fail over module - primary and backup. The backup servers are supposed to only be used temporarily and sssd periodically tries to connect to one of the primary servers. However, from the fail over code point of view, even adding a “backup” server still means the server is added to the same linked list, just with a flag denoting that the server is not primary, therfore iterating over a single list would iterate over both the primary and backup servers.
Before changing the online callbacks, it would be useful to implement and
read the krb5_kdcinfo_lookahead
option so that there is already an
upper limit when the callbacks write the extra host names.
The next step of implementation could be extending the online
callbacks that call the write_krb5info_file
functions. There are
several of them, ad_resolve_callback
, ipa_resolve_callback
and krb5_resolve_callback
. The callbacks receive the current
struct fo_server
instance. The callbacks would then keep iterating
over the linked list until either the list is exhausted or as many as
krb5_kdcinfo_lookahead
items are processed. The host name from the
struct server_common
structure would be read using fo_get_server_name
and written to the array passed to write_krb5info_file
.
One question to consider is whether to use the fo_server
instances before
the current one, i.e. those that SSSD tried before and couldn’t connect to.
I think it would make sense to add them to the end of the list, at least
for the primary servers not from a SRV query, because sssd never reconnects
to a server earlier in the list as long as later server works. The SRV queries
are different in this respect in the sense that they time out and force
SSSD to resolve the whole list once a server is requested again (typically
either during authentication or once the LDAP connection expires).
Finally, the case where the fail over code needs to do additional lookups
in order to resolve at least the amount of host names requested by the
krb5_kdcinfo_lookahead
should be addressed. The caller that initializes
the fail over service (maybe with be_fo_add_service
) should provide
a hint with the value of the lookahead option. Then, if a request for
server resolution is triggered, the fail over code would resolve a server
and afterwards check if enough fo_server
entries with a valid hostname
in the struct server_common
structure. If not, the request would
check if any of the fo_server
structures represents a SRV query and
try to resolve the query to receive more host names.
Configuration changes¶
A new configuration option called krb5_kdcinfo_lookahead
would be added.
This option would default to a sensible non-zero value in the master
branch, perhaps 3 so that attempting to resolve the extra host names does
not cause the libkrb5 operation to time out. If the patches are backported
to any stable branch, the option must default to 0 (disabled).
In the first iteration, we might want to just read a single number, but
in the future, the option should be extended to accept two numbers in the
total:backup
notation. This would mean write up to total
servers,
but include up to backup
servers from the backup list. This would be
useful in case none of the servers from the primary list are reachable,
because e.g. they all come from the same AD site, but servers outside the
site are reachable. This extension would only make sense if SSSD does not
resolve the host names on its own, which might be another future extension.
It might be a good idea to add a note to the sssd-ad
and sssd-ipa
man pages or even the shared fail over man page include file with a pointer
to how the kdcinfo files work so that the information is easy to discover
for administrators.
How To Test¶
- Plugin test
- With any of the below tests or even after writing the host names to the kdcinfo files directly, make sure the first entry in the list is unreachable. Then call e.g. kinit and check that the operation succeeds.
- Backwards compatibility test
- Set the
krb5_kdcinfo_lookahead
option to 0. Define multiple servers and perform Kerberos authentication. Make sure that only the current server is written to the kdcinfo files. - Write a list of servers
- Set the
krb5_resolve_callback
to a positive value. Make sure that the first entry in the kdcinfo files is an address and the other entries are host names from the configuration. This test case should be extended to make sure only so many entries as the value of the option are written, or if there are fewer entries in the config file, all are writen. - Fail over test
- Similar to the above, except make sure the first entry in the list cannot be contacted. Then, SSSD should resolve the next entry to the address and if applicable write the rest of the list.
- Backup server test
- At the minimum, we should make sure that servers from the backup list
are written to the kdcinfo files. If the option would implement the split
total:backup
value, then those should be tested as well. - (Optional) writing a previously tried, not working server
- If it is agreed during design review that also not working servers are to be written to the kdcinfo files (see the section about not working servers), then a test case should make sure those are written to the end of the list.
- SRV resolution test
- Leave the server list (e.g.
krb5_server
) option empty. Make sure a DNS SRV query for the configured realm returns valid servers and they are written to the config file. - Combined SRV and server list
- Set the
krb5_server
option tohostname, _srv_
. Set thekrb5_kdcinfo_lookahead
option to a value greater than 1. Make sure that the host names from the DNS SRV query are also present in the kdcinfo files. - IPA client test
- The test cases above should be repeated for an IPA client as well in case the IPA online callbacks are modified.
- AD site test
- Add an AD client to a site or set the site in the config file. Make
sure that the servers from the site are written first, followed
by the global servers up to the
krb5_kdcinfo_lookahead
value.
How To Debug¶
Any new code must be decorated with DEBUG messages. To debug the locator
plugin changes, using KRB5_TRACE
or even calling strace
might be
useful.
Future development¶
First, it might be useful to extend the resolver or fail over code to resolve
the names on its own to save some potentially blocking calls in the plugin.
There is already an example of resolv_hostport_list_send
that can perhaps
be reused.
Additionally, we already plan for some time to include connectivity checks
with cLDAP ping or just plain connect()
to make sure that servers that
cannot be contacted at all are not tried. This is of course outside of the
scope of this work, but should be kept in mind to not implement something
incompatible.
Authors¶
- Sumit Bose <sbose@redhat.com>
- Tomas Halman <thalman@redhat.com>
- Jakub Hrozek <jhrozek@redhat.com>