Thoughts about maintanability of bind-dyndb-ldap

Background

  • BIND is non-trivial code base:
    • BIND has custom abstraction of threads + custom implementation of memory management, locking, ..., and almost everything else.
    • Parts of BIND are event-driven and run in parallel.
    • Yet, some internal parts of BIND are not thread safe!
      • Some functions must be only called under special conditions but these are often not documented.
      • Sometimes the unmet invariants do not cause crash but sneaky state corruption.
  • BIND is not structured as fully pluggable server:
    • The ‘simple’ plugin interfaces (DLZ, SDB) in BIND are very limited.
    • DLZ and SDB do not support the everything we need for features implemented by bind-dyndb-ldap (forward zones, DNSSEC in-line signing, PTR synchronization).
    • A lot of logic in BIND is hidden in functions unavailable to plugins.
  • Our DynDB interface was accepted upstream and is a part of BIND 9.11 release so we don’t need to patch BIND anymore.
    • The DynDB API is very simple: BIND just does dlopen(driver) and then call to init() and destroy() functions from the drivers. That is it.
    • Problem: This does not significantly improve situation because of BIND private functions mentioned above.

Long-term problems

Code and effort duplication

Because a lot of logic in BIND is not exposed to plugins (or not split into functions at all!), the bind-dyndb-ldap has to re-implement a lot of BIND logic itself. Of course, this means that we introduce a lot of new bugs.

For example zone configuration logic in BIND consists of (roughly):

All this logic has to be re-implemented in bind-dyndb-ldap:

  • 1100 lines in ldap_helper.c reimplement zone configuration logic:
    • Only minimal set of configuration options for zones is supported.
    • Some options cannot be implemented at all because of lack of interfaces from BIND.
    • There is no guarantee that our implementation of options have the same behavior as the implementation in BIND.

We can quantify this problem by categorizing all source files to several categories. Rough statistics for C files in bind-dyndb-ldap source tree:

Plumbing

  • Code duplicating BIND functionality and gluing bind-dyndb-ldap to rest of BIND:
File Functionality Line count
acl.c ACL parser 634
empty_zones.c Empty zone handling 466
fwd_register.c Forward zone handling 143
ldap_driver.c BIND driver plumbing 1129
ldap_helper.c (part 2) Zone configuration 1100
lock.c Utility: locking 57
log.c Utility: log 24
rbt_helper.c Utility: red-black tree 179
settings.c Utility: configuration 661
zone_register.c Master zone handling 626
zone.c Master zone handling 148
  • Utility without significant value:
File Functionality Line count
fs.c File system utilities 116
krb5_helper.c Kerberos support 191
semaphore.c Utility: semaphore 130
str.c Utility: strings 312
zone_manager.c Driver instance management 200

Functionality

  • Integration with LDAP, which consequently provides multi-master functionality:
File Functionality Line count
ldap_convert.c LDAP data format utilitity 459
ldap_entry.c LDAP entry parser 594
ldap_helper.c (part 1) LDAP plumbing 3571
metadb.c LDAP metadata handling 407
mldap.c LDAP metadata handling 539
syncrepl.c LDAP: SyncRepl 537
  • PTR record synchronization functionality:
File Functionality Line count
syncptr.c PTR record synchronization 530

Duplicated code and glue is 6117 / 12753 lines = 48 % of code base. Functionality is implemented by 6637 / 12753 lines = 52 % of code base.

It has to be taken into account that the code which integrates with BIND (the glue code) is often harder to write and debug than the feature code.

Alternative metric to number of code lines is number of tickets.

  • At the time of this writting there is 147 tickets (not counting tickets closed as duplicate/invalid/workforme).
  • 126 / 147 tickets = 88 % was identified as work which had to be done because of re-implementing BIND’s functionality or fixing the re-implemented. These tickets are marked with keyword nihsyndrome.
  • 19 / 147 tickets = 13 % concerned `fixing crashes caused by the glue code.

Bypassing standard interfaces

  • Interfaces used by bind-dyndb-ldap were previously used only by BIND’s core.
  • BIND has a lot of checks and auto-tuning mechanism which are triggered only when using proper configuration paths (configuration files, rndc tool).
    • Interfaces exposed to plugins are very low-level so we have to re-implement all the checks (or crash).
    • It is very hard to get it right.

Example: forward zones

Configuration in BIND - /etc/named.conf:

zone "fwd.example." IN {
        type forward;
        forward only;
        forwarders { 192.0.2.1; };
};

Implementation of this trivial configuration in bind-dyndb-ldap (roughly) consists of:

File Functionality Line count
empty_zones.c Empty zone handling 466
fwd_register.c Forward zone handling 143
ldap_helper.c Forward zone parser 300

bind-dyndb-ldap implementation consist of ~ 900 lines of code = 7 % of code base

  • excluding all utility code
  • out implementation is not feature complete as many options from BIND is not supported at all or is handled only partialy
  • crashes caused by improper glue implementation: #145, #148, #142, #110
    • 21 % of all crashes
  • other bugs or (then) missing features: #160, #146, #106, #105, #98, #97, #96, #49, #45,
    • 9 % of all tickets

Example: static master DNS zone

Configuration in BIND - /etc/named.conf:

zone "static.example." IN {
        type master;
        file "static.example.db";
        allow-update { none; };
};

Implementation of basic functionality in bind-dyndb-ldap (roughly) consists of:

File Functionality Line count
ldap_driver.c Gluing the database to BIND 1100
ldap_helper.c Zone management and parser 852
zone.c Support for zone loading 50
zone_register.c Master zone handling 626

bind-dyndb-ldap implementation consist of ~ 1640 lines of code = 13 % of code base

  • excluding all utility code
  • does not support any ACL configuration, dynamic updates, DNSSEC, ...
  • out implementation is not feature complete even when considering only static zones - few options from BIND are supported
  • crashes caused by improper glue implementation: #156, #138, #132, #109, #107
    • 26 % of all crashes
  • other bugs or (then) missing features: #157, #135, #134, #122, #119, #117, #95, #92, #77, #70, #69, #64, #63, #59, #47, #35, #28, #14, #6, #2
    • 17 % of all tickets

Example: dynamic master DNS zone

Configuration in BIND - /etc/named.conf:

zone "static.example." IN {
        type master;
        file "dynamic.example.db";
        update-policy { grant fred.example.net name example.net A; };
};

Implementation of dynamic update functionality in bind-dyndb-ldap (roughly) consists of:

File Functionality Line count
acl.c ACL parser 634
fs.c Journal cleaning 116
ldap_helper.c Zone update hooks 500
zone.c Zone journal 100

bind-dyndb-ldap implementation consist of ~ 1350 lines of code = 11 % of code base

  • excluding all utility code
  • does not support DNSSEC, ...
  • out implementation is far from feature complete - few options from BIND are supported
  • crashes caused by incorrect glue implementation: #111, #108, #101, #93, #89, #18
    • 32 % of all crashes
  • other bugs or (then) missing features: #158, #152, #144, #116, #79, #50, #46, #12, #10, #5, #1
    • 12 % of all tickets

BIND state entanglement

(this is related to bypassing standard interfaces mentioned above)

  • BIND has a lot of internal state
    • Locking inside BIND is complicated and not fully documented.
    • The internal state of BIND is not fully controlled by bind-dyndb-ldap.
  • bind-dyndb-ldap’s use of low-level functions require knowledge about internal state of BIND
    • consequence: It is extremely hard to unit test bind-dyndb-ldap!
      • data point: After three days I was still not able to get initialization of data structures properly so I gave up.
  • bind-dyndb-ldap often has to keep additional information which cannot be represented in BIND structures (e.g. UUID of LDAP entry etc.)
    • consequence: separate state in bind-dyndb-ldap needs to be updated in lock-step with state of BIND
      • consequence: it is easy to get it wrong corrupt state of BIND, or the plugin, or both
  • Over the time, it seems that we are building shadow-BIND which gets more and more complex.
    • Adding new features without breaking old ones is increasingly hard.

What do we do?

Options:

  • Keep going as we do now
  • Rework BIND internals to make it fully pluggable
  • Re-write the plugin (or its equivalent) for another DNS server
  • Attempt to separate BIND<=>LDAP synchronization to a separate daemon which users standard interfaces (Martin Basti has a prototype) and keep using LDAP as backend
  • Extend the idea of separate daemon and get rid of LDAP backend completely

Decision from DevConf 2016

  • Given the facts above, extending bind-dyndb-ldap beyond intermediate needs of FreeIPA project is not practical, causes maintenance issues and is thus discouraged.
  • In a long term, it would make sense to investigate a separate BIND<=>LDAP synchronization in a separate daemon which users standard interfaces and keep using LDAP as backend to avoid the maintenance cost of tightly integrated approach.