o implement add1() using 32 bit ints, this makes _krb5_n_fold()
about 5% faster on an amd64 platform. 64 bit ints yield a
further improvement but we would need to test the platform
to see if they are natively supported. This should yield
better performance improvements on big endian machines as
we have to byte swap on little endian boxen.
o fix two cases where a malloc(3)d pointer may be dereferenced
before we test that it is not NULL.
All in lib/krb5/n-fold.c:
1. eliminate malloc/free from rr13() because it is always a
buffer of the same size called in a tight loop.
2. eliminate memcpy(3) from rr13() by bouncing back and forth
between two buffers buf1, buf2 instead of performing the
calculation into a tmp buffer and memcpy(3)ing the result
back into buf.
3. eliminate code cases from rr13() that I can visually determine
will never occur but I'm guessing that the compiler can't, i.e.
i. now that we're no longer using malloc(3), rr13()
cannot fail, so make it void and avoid the if in
the calling routine checking its error code. In
case you ask, yes, this made the tests run a little
faster,
ii. rr13() has code for being passed a number of bits
not divisble by 8 but _krb5_n_fold() only passes
an int * 8. So, we eliminate this conditional and
the associated code.
4. we make rr13() take 2 destination buffers and copy the results
into both of them, we use this to eliminate another memcpy(3)
from the calling routine. This appears to make it a bit faster
as well.