This addresses part of https://github.com/heimdal/heimdal/issues/1214
to audit potential network leaks with [libdefaults] block_dns = yes.
NI_NUMERICHOST is _probably_ sufficient -- we probably won't see many
systems using NIS to look up service names by number if we fail to
specify NI_NUMERICSERV, and such systems probably require careful
auditing of their own. And I don't know of any way NI_NUMERICSCOPE
could trigger network leaks. But named scope ids are such a niche
option with IPv6 that setting it to forestall concerns can't hurt
much, and it makes reviewing easier if we just unconditionally flip
on all the numeric-only options.
This change has two parts:
1. Provide our own local implementation of numeric-only getaddrinfo
in auditdns.c used to audit for DNS leaks, rather than deferring
to dlsym(RTLD_NEXT, "getaddrinfo"), in terms of inet_pton.
To keep review and implementation simple, this is limited to
AI_NUMERICHOST _and_ AI_NUMERICSERV -- this requires that we
arrange to pass AI_NUMERICSERV in callers too.
2. Wherever we implement block_dns, set AI_NUMERICSERV in addition to
AI_NUMERICHOST as needed by the new auditdns.c getaddrinfo.
(In principle this might also avoid other network leaks -- POSIX
guarantees no name resolution service will be invoked, and gives
NIS+ as an example.)
One tiny semantic change to avoid tripping over the auditor:
kadmin(8) now uses the string "749" rather than the string
"kerberos-adm". (Currently we don't audit kadmin(8) for DNS leaks
but let's avoid leaving a rake to step on.) Every other caller I
found is already guaranteed to pass a numeric service rather than
named service to getaddrinfo.
fix https://github.com/heimdal/heimdal/issues/1212
If block_dns is set, call getaddrinfo with AI_NUMERICHOST set and
AI_CANONNAME clear.
Some paths may not have set AI_CANONNAME, but it's easier to audit
this way when the getaddrinfo prelude is uniform across call sites,
and the compiler can optimize it away.
If the master is unreachable for a while we can end up with expired
tickets that don't get refreshed, then ipropd-slave gets stuck until
it's manually restarted.
With this change it's possible to bootstrap a KDC using a client
certificate with a PKINIT SAN for iprop/fqdn. Given such a certificate
one could run ipropd-slave via kinit to pull down the initial copy of
the HDB, then start the KDC services using the HDBGET: keytab.
That should make bootstrapping new secondary KDCs very easy.
One could bootstrap the KDC with such a certificate using, e.g.,
Safeboot (https://github.com/osresearch/safeboot), enrolling the host as
a KDC.
Remove hdb_entry_ex and revert to the original design of hdb_entry (except with
an additional context member in hdb_entry which is managed by the free_entry
method in HDB).
Decorate HDB_entry with context and move free_entry callback into HDB structure
itself. Requires updating hdb_free_entry() signature to include HDB parameter.
A follow-up commit will consolidate hdb_entry_ex (which has a single hdb_entry
member) into hdb_entry.
If a DB does not already exist, ipropd-slave will use the compiled
default, which is not necessarily what is desired or configured in
`[kdc]`.
This change makes `hdb_default_db()` return the first dbanme in the
`[kdc]` configuration, falling back on `HDB_DEFAULT_DB`.
Also, this adds a `--database` option to `ipropd-slave`.
tests/kdc/check-iprop.in tends to wait for a log message then it reads a
status file. Well, we shouldn't write the log message before writing
the status file then!
Running ipropd-slave on a system whose hostname's realm is not the
requested realm breaks. Since the iprop client principal should really
be in the same realm as the master, we now force it after calling
krb5_sname_to_principal().
ipropd-master sends AYT messages often as a result of a possibly-
transient error, but if the slave responds to such an AYT with I_HAVE,
then the same code path that failed will be executed on the master, and
if the error wasn't transient then we'll loop hard. So don't send an
I_HAVE in response to an AYT.
Doing an fsync per-record when receiving the complete HDB is a performance
disaster. Among other things, if the HDB is very large, then one slave
receving a full HDB can cause other slaves to timeout and, if HDB write
activity is high enough to cause iprop log truncation, then also need full
syncs, which leads to a cycle of full syncs for all slaves until HDB write
activity drops.
Allowing the iprop log to be larger helps, but improving receive_everything()
performance helps even more.
Otherwise we risk storing a name with the referral (empty) realm name,
which will then cause various knock-on effects, such as thinking that
the start_realm is "", and failing to find matching credentials in the
ccache.
When the master's log has all entries from version 1 to now, and no
uber entry (legacy master), then new slaves will not pull version 1,
because their uber record has version 1.
The fix is to force the uber version to 0 always, and avoid adding a
truncate nop when doing a full prop. The uber record now records the
database version even in the absence of any other log entries so that
we know what to pull going forward.
- fix int/uint confusion and use unsigned integral types for time
- improve messages
- add --verbose option
- attempt transaction recovery in ipropd-master during idle times
- begin hardening daemons against dying at the slightest provocation
- better recovery from various errors
- daemons now restart automatically in most of the many error cases
where the daemons still die
We used to update the iprop log and HDB in different orders depending on
the kadm5 operation, which then led to various race conditions.
The iprop log now functions as a two-phase commit (with roll forward)
log for HDB changes. The log is auto-truncated, keeping the latest
entries that fit in a configurable maximum number of bytes (defaults to
50MB). See the log-max-size parameter description in krb5.conf(5).
The iprop log format and the protocol remain backwards-compatible with
earlier versions of Heimdal. This is NOT a flag-day; there is NO need
to update all the slaves at once with the master, though it is advisable
in general. Rolling upgrades and downgrades should work.
The sequence of updates is now (with HDB and log open and locked):
a) check that the HDB operation will succeed if attempted,
b) append to iprop log and fsync() it,
c) write to HDB (which should fsync()),
d) mark last log record committed (no fsync in this case).
Every kadm5 write operation recover transactions not yet confirmed as
committed, thus there can be at most one unconfirmed commit on a master
KDC.
Reads via kadm5_get_principal() also attempt to lock the log, and if
successful, recover unconfirmed transactions; readers must have write
access and must win any race to lock the iprop log.
The ipropd-master daemon also attempts to recover unconfirmed
transactions when idle.
The log now starts with a nop record whose payload records the offset of
the logical end of the log: the end of the last confirmed committed
transaction. This is kown as the "uber record". Its purpose is
two-fold: act as the confirmation of committed transactions, and provide
an O(1) method of finding the end of the log (i.e., without having to
traverse the entire log front to back).
Two-phase commit makes all kadm5 writes single-operation atomic
transactions (though some kadm5 operations, such as renames of
principals, and changes to principals' aliases, use multiple low-level
HDB write operations, but still all in one transaction). One can still
hold a lock on the HDB across many operations (e.g., by using the lock
command in a kadmin -l or calling kadm5_lock()) in order to push
multiple transactions in sequence, but this sequence will not be atomic
if the process or host crashes in the middle.
As before, HDB writes which do not go through the kadm5 API are excluded
from all of this, but there should be no such writes.
Lastly, the iprop-log(1) command is enhanced as follows:
- The dump, last-version, truncate, and replay sub-commands now have an
option to not lock the log. This is useful for inspecting a running
system's log file, especially on slave KDCs.
- The dump, last-version, truncate, and replay sub-commands now take an
optional iprop log file positional argument, so that they may be used
to inspect log files other than the running system's
configured/default log file.
Extensive code review and some re-writing for clarity by Viktor Dukhovni.
The following sequence of events results in slave B having a stale HDB:
- slave A connects to master, master dumps HDB for the slave
- kadm5 operations
- slave B connects to master, master sends previously dumped HDB
slave B won't discover any updates until the next transaction.
The fix is simple: the slave should immediately call ihave() after
receiving a complete HDB.
Tests that start daemons have to "wait" for them to start.
This commit makes Heimdal daemons prep to detach (when requested) by
forking early, then having the child signal readiness to the parent when
the child really is ready. The parent exits only which the child is
ready. This means that tests will no longer need to wait for daemons.
However, tests will still need a pidfile or such so they can stop the
daemons.
Note that the --detach options should not be used on OS X from launchd,
only from tests.
We turn on a few extra warnings and fix the fallout that occurs
when building with --enable-developer. Note that we get different
warnings on different machines and so this will be a work in
progress. So far, we have built on NetBSD/amd64 5.99.64 (which
uses gcc 4.5.3) and Ubuntu 10.04.3 LTS (which uses gcc 4.4.3).
Notably, we fixed
1. a lot of missing structure initialisers,
2. unchecked return values for functions that glibc
marks as __attribute__((warn-unused-result)),
3. made minor modifications to slc and asn1_compile
which can generate code which generates warnings,
and
4. a few stragglers here and there.
We turned off the extended warnings for many programs in appl/ as
they are nearing the end of their useful lifetime, e.g. rsh, rcp,
popper, ftp and telnet.
Interestingly, glibc's strncmp() macro needed to be worked around
whereas the function calls did not.
We have not yet tried this on 32 bit platforms, so there will be
a few more warnings when we do.
The ipropd_slave will log its status to /var/heimdal/ipropd-slave-status
if its connecting, up to date, or disconnected.
The master will now also confirm to slaves that are are in fact up to date
if they just restart, before there was no confirmation, the slave just didn't
get any deltas.