Dates!
No, not like going on a date.
Dates: the fifteenth of March; November 22, 1963; etc.
In an effort to reduce hand-rolled but otherwise-generic code in
kcgi, I recently set out to convert date handling to
use the system functions strftime(3),
mktime(3), etc.
Unfortunately, I quickly realised that the supported systems handle dates quite differently, and
ended up doing the opposite.
introduction
As it stood up to version 0.12.0,
kcgi's handling of dates mixed hand-rolled functions
for converting between epoch values and broken-down time, and system functions for formatting.
These are laid out in
datetime.c
for that version.
The regression tests that already existed were spotty and failed to cover any corner cases in
date handling.
I didn't choose to examine the date functions randomly: it was part of an ongoing process to
convert all kcgikutil utility functions
to having a khttp prefix; and in doing so, to review the implementation and
correctness of said functions.
BSD.lv's new portability infrastructure has played no small part
in casting light in the areas where the code can use more clarity and consistency.
At heart, date handling needs to convert freely between two string representations and two
binary representations.
It's critically important that each transition is fully defined and correct, so I started by
replacing hand-rolled binary conversions with system functions.
removing hand-rolled conversions
The system functions to convert between binary representations are
timegm(3) and
gmtime(3).
These convert between UNIX epoch time (an integer of seconds before or after the start of 1970,
UTC) and broken-down time values, which separately represent day, month, year, hour, etc. as
integers.
Switching to these functions significantly reduced code size— or rather, off-loaded the
complexity to the C library.
My development platform for this was OpenBSD, which has
used clean 64-bit time_t values since a monumental effort in 2013.
The time_t type is used by systems to represent UNIX epoch.
Broken-down time is represented by a standard int.
These are not fixed-width types.
I started with
khttp_epoch2tms(3) and
khttp_datetime2epoch(3),
which convert between these two forms.
While doing so, I also added regression tests for all of the converted functions.
These regression tests probed the full range of possible input values.
I committed the results and waited for our CI systems to test it on all other operating systems.
Result: immediate breakage.
The biggest breakage came from 32-bit time_t types on some systems, which I frankly
didn't expect to be a problem any more.
But it is.
The problem arises because
kcgi passes around explicitly-sized integers for the
UNIX epoch, specifically int64_t, which allows for more values than a 32-bit
time_t can handle.
When passing these values into the 32-bit systems'
gmtime(3), they suffered conversion.
Then there were also some surprising results, such as converting from broken-down time with a
year before 1900 to an epoch value.
On FreeBSD, this inexplicably failed.
The API itself presented problems that simply slipped my mind: a 64-bit time_t
can represent more than a 32-bit int year can encode, so converting large times to
broken-down time failed.
These are documentation problems, as the broken-down time representation cannot change.
I also needed to worry about representing (time_t)-1, which is both a legitimate
representation of one second before the UNIX epoch and the error return value for
timegm(3).
Confusing!
In light of these issues, I quickly decided to change my approach and instead return to using
private copies of the conversion functions.
re-rolling conversions
I started by merging an appropriately-licensed implementations of
timegm(3)
and
gmtime(3)
from
newlib that were small, easy to read, well-tested,
and complete—and more importantly, 64-bit safe.
Upon doing so, I was able to verify that all sane 64-bit input values were properly converting
to and from the given time values.
Using these imports also relieved the burden of pre-checking for (time_t)-1, since
they never returned an error.
For symmetry, I also added
khttp_datetime2epoch(3),
which mirrors
khttp_epoch2datetime(3)
in that it converts between int64_t broken-down time instead of int.
I then moved on to formatting functions.
re-rolling formatting
There are two formatted outputs handled by kcgi:
ISO 8601
and RFC 822 (modernised as
5322).
Prior to this effort, kcgi used the
strftime(3) function for the latter and
normal string handling for the latter.
While the ISO 8601 date processing handled equally well on all systems, there were some corner
cases for RFC 5322 formatting.
First, negative years; the second, years with more than four digits.
The RFC is mostly silent on how these are handled, but it's safe to assume that we should handle
arbitrary dates in the sane way: negative years and as many year digits as required.
I was then surprised that the
strftime(3) truncated year values on some SunOS
systems, specifically Oracle Solaris.
Moreover, by accepting a
struct tm, I knew that formatting was impossible for year values beyond the 32-bit
barrier.
Fortunately, fixing this is easy: since
khttp_epoch2datetime(3)
is able to convert into all the necessary date components, it only took two string table lookups
for week names and months, then using regular string handling.
Solved with
khttp_epoch2str(3).
performance problems
When testing for corner-case dates, such as those with years needing more than 32 bits,
unexpected difficulties came from the conversion utility.
Specifically, when computing the seconds from the year, the code stepped through each year from
1970 or so, accumulating seconds.
For the valid year of 1 152 921 504 606 846 976, or 260, this computation would take
quite some time.
Fortunately, this code is easily optimised since the number of days in 400-year blocks, with
1900 as a baseline, is fixed.
It was trivial to change the code to step only to 400-year multiples, eat the remaining 400
years with a single division, then compute the remainder.
conclusion and future steps
kcgi is now able to handle all representable 64-bit
dates. A representable date is one that can convert between broken-down and epoch time without
integer truncation, such as converting from a 64-bit epoch to a 32-bit year (the year might roll
over) or a 64-bit year to a 64-bit epoch (the epoch might roll over).
The result of this work were produced in kcgi version
0.12.1.
Most of this work was in
datetime.c.
An important function that still needs adding is converting from formatted dates into
epoch values.
This is to prevent callers from using their system conversion functions, which may be limited in
the ways described above.
This won't be a difficult piece of code to write, and can wait for a future version.
Last but not least, it's important to remember that converting between UNIX epoch time and
broken-down time will always be a source of error.
Either the broken-down year will allow for more years than may be encoded in the epoch or vice
versa.
It's important that these functions specifically document the range of acceptable inputs!