As it stood up to version 0.12.0, kcgi's handling of dates mixed hand-rolled functions for converting between epoch values and broken-down time, and system functions for formatting. These are laid out in datetime.c for that version. The regression tests that already existed were spotty and failed to cover any corner cases in date handling.
I didn't choose to examine the date functions randomly: it was part of an ongoing process to
convert all kcgi
kutil utility functions
to having a
khttp prefix; and in doing so, to review the implementation and
correctness of said functions.
BSD.lv's new portability infrastructure has played no small part
in casting light in the areas where the code can use more clarity and consistency.
At heart, date handling needs to convert freely between two string representations and two binary representations.
It's critically important that each transition is fully defined and correct, so I started by replacing hand-rolled binary conversions with system functions.
The system functions to convert between binary representations are timegm(3) and gmtime(3). These convert between UNIX epoch time (an integer of seconds before or after the start of 1970, UTC) and broken-down time values, which separately represent day, month, year, hour, etc. as integers. Switching to these functions significantly reduced code size— or rather, off-loaded the complexity to the C library.
My development platform for this was OpenBSD, which has
used clean 64-bit
time_t values since a monumental effort in 2013.
time_t type is used by systems to represent UNIX epoch.
Broken-down time is represented by a standard
These are not fixed-width types.
I started with khttp_epoch2tms(3) and khttp_datetime2epoch(3), which convert between these two forms. While doing so, I also added regression tests for all of the converted functions. These regression tests probed the full range of possible input values. I committed the results and waited for our CI systems to test it on all other operating systems.
Result: immediate breakage.
The biggest breakage came from 32-bit
time_t types on some systems, which I frankly
didn't expect to be a problem any more.
But it is.
The problem arises because
kcgi passes around explicitly-sized integers for the
UNIX epoch, specifically
int64_t, which allows for more values than a 32-bit
time_t can handle.
When passing these values into the 32-bit systems'
gmtime(3), they suffered conversion.
Then there were also some surprising results, such as converting from broken-down time with a year before 1900 to an epoch value. On FreeBSD, this inexplicably failed.
The API itself presented problems that simply slipped my mind: a 64-bit
can represent more than a 32-bit
int year can encode, so converting large times to
broken-down time failed.
These are documentation problems, as the broken-down time representation cannot change.
I also needed to worry about representing
(time_t)-1, which is both a legitimate
representation of one second before the UNIX epoch and the error return value for
In light of these issues, I quickly decided to change my approach and instead return to using private copies of the conversion functions.
I started by merging an appropriately-licensed implementations of timegm(3) and gmtime(3) from newlib that were small, easy to read, well-tested, and complete—and more importantly, 64-bit safe. Upon doing so, I was able to verify that all sane 64-bit input values were properly converting to and from the given time values.
Using these imports also relieved the burden of pre-checking for
they never returned an error.
For symmetry, I also added
in that it converts between
int64_t broken-down time instead of
I then moved on to formatting functions.
There are two formatted outputs handled by kcgi: ISO 8601 and RFC 822 (modernised as 5322). Prior to this effort, kcgi used the strftime(3) function for the latter and normal string handling for the latter.
While the ISO 8601 date processing handled equally well on all systems, there were some corner cases for RFC 5322 formatting. First, negative years; the second, years with more than four digits. The RFC is mostly silent on how these are handled, but it's safe to assume that we should handle arbitrary dates in the sane way: negative years and as many year digits as required.
I was then surprised that the
strftime(3) truncated year values on some SunOS
systems, specifically Oracle Solaris.
Moreover, by accepting a
struct tm, I knew that formatting was impossible for year values beyond the 32-bit
Fortunately, fixing this is easy: since khttp_epoch2datetime(3) is able to convert into all the necessary date components, it only took two string table lookups for week names and months, then using regular string handling. Solved with khttp_epoch2str(3).
When testing for corner-case dates, such as those with years needing more than 32 bits, unexpected difficulties came from the conversion utility. Specifically, when computing the seconds from the year, the code stepped through each year from 1970 or so, accumulating seconds. For the valid year of 1 152 921 504 606 846 976, or 260, this computation would take quite some time.
Fortunately, this code is easily optimised since the number of days in 400-year blocks, with 1900 as a baseline, is fixed. It was trivial to change the code to step only to 400-year multiples, eat the remaining 400 years with a single division, then compute the remainder.
kcgi is now able to handle all representable 64-bit dates. A representable date is one that can convert between broken-down and epoch time without integer truncation, such as converting from a 64-bit epoch to a 32-bit year (the year might roll over) or a 64-bit year to a 64-bit epoch (the epoch might roll over).
The result of this work were produced in kcgi version 0.12.1. Most of this work was in datetime.c.
An important function that still needs adding is converting from formatted dates into epoch values. This is to prevent callers from using their system conversion functions, which may be limited in the ways described above. This won't be a difficult piece of code to write, and can wait for a future version.
Last but not least, it's important to remember that converting between UNIX epoch time and broken-down time will always be a source of error. Either the broken-down year will allow for more years than may be encoded in the epoch or vice versa. It's important that these functions specifically document the range of acceptable inputs!