BCHS

Your software using the OpenBSD programming environment needs to be ported to other systems. What first step do you take?

    1 #if defined(__linux__)
    2 # define _GNU_SOURCE /* memmem */
    3 #endif
    4 #if defined(__NetBSD__)
    5 # define _OPENBSD_SOURCE /* reallocarray */
    6 #endif
    7 #if defined(__sun)
    8 # ifndef _XOPEN_SOURCE /* SunOS already defines this */
    9 #  define _XOPEN_SOURCE /* IllumOS for XPGx */
   10 # endif
   11 # define _XOPEN_SOURCE_EXTENDED 1 /* XPG4v2 */
   12 # ifndef __EXTENSIONS__ /* SunOS already defines this */
   13 #  define __EXTENSIONS__ /* reallocarray */
   14 # endif
   15 #endif
   16 
   17 int
   18 main(void)
   19 {
   20 	char	*tmp;
   21 	size_t	 nm = 100, sz = 100;
   22 #if defined(__linux__)
   23 	/* FIXME: multiplication overflow test */
   24 	tmp = realloc(NULL, nm * sz);
   25 #else
   26 	tmp = reallocarray(NULL, nm, sz);
   27 #endif
   28 	return memmem(tmp, nm * sz, "foo", 3) == NULL;
   29 }

On second thought, maybe porting isn't a good idea… Run away! Run away!

The above toy program accounts for Linux, NetBSD, FreeBSD, OpenBSD, Mac OS X, Solaris, and IllumOS. The macro soup is because memmem(3) is a GNU extension and reallocarray(3) (which here is superfluous, but used for exemplar's sake) is an OpenBSD extension. Both of these functions are pretty… normal, which makes the complexity particularly disenchanting.

Obviously adding this for each file, each function, puts an impossible burden on developers. This is when a configuration script comes into play: it can feature-detect on the target system and output the correct macro soup into a shared header file. A compatibility source file, included when linking, can provide missing functions.

The BSD.lv tools I wanted to port all used and use oconfigure for this purpose. It received the lion's share of attention in this portability work.

history

The BSD.lv tools I wanted to port all usd oconfigure as a simple portability shim. This script was originally forked from mandoc in 2014 for use in kcgi. Originally, it feature-tested for memmem(3) and strtonum(3), which weren't (aren't) available on Mac OS X, and provided compatible functions if these weren't detected. You can see its original import in commit a5ff55c, with the original file called configure.

The script simply declared whether these functions existed or not as defined from a shell script compiled and run during configuration.

    1 #include <string.h>
    2 
    3 int
    4 main(void)
    5 {
    6 	const char *ep;
    7 	int a = strtonum("20", 0, 30, &ep);
    8 	return(a != 20);
    9 }

Original test for strtonum(3). On OpenBSD, this would compile and run fine. On Mac OS X, the function would not be located.

The results were output into a header file config.h, which was in turn included by the system sources. Below you can see an example. The HAVE_xxx defines were inserted based if the test passes. Compatibility functions were all in compat.c, which also included config.hhad ifndef guards around compatibility function.

    1 #ifndef	CONFIG_H
    2 #define	CONFIG_H
    3 
    4 #define HAVE_MEMMEM
    5 #define HAVE_STRTONUM
    6 
    7 #if !defined(__BEGIN_DECLS)
    8 #  ifdef __cplusplus
    9 #  define __BEGIN_DECLS extern "C" {
   10 #  else
   11 #  define __BEGIN_DECLS
   12 #  endif
   13 #endif
   14 #if !defined(__END_DECLS)
   15 #  ifdef __cplusplus
   16 #  define __END_DECLS }
   17 #  else
   18 #  define __END_DECLS
   19 #  endif
   20 #endif
   21 
   22 #ifndef HAVE_STRTONUM
   23 extern long long strtonum
   24 	(const char *numstr, long long minval, 
   25 	 long long maxval, const char **errstrp);
   26 #endif
   27 #ifndef HAVE_MEMMEM
   28 extern void *memmem
   29 	(const void *l, size_t l_len, 
   30  	 const void *s, size_t s_len);
   31 #endif
   32 
   33 #endif /* CONFIG_H */

A config.h generated by the earliest versions of the configure script in kcgi. In this case, both functions were detected.

(Can you spot the portability issue in the above? If a system without memmem(3) includes this, it's possible that sys/types.h wouldn't be included, which defines size_t, resulting in a compilation error.)

Sources would include the configuration header config.h and link to the compatibility sources. Let main.c below be an example file.

By 2016, there were about a dozen tests in the script (you can see it in configure for this date), with roughly half of them providing a compatibility function if not found. I started to want this functionality for other systems, so I created a new repository called oconfigure to abstract the work.

First I pulled in mandoc's new features for a site configuration script, which overrode test results, and to emit Makefile configuration variables as well. To keep things simple, I then had all tests and compatibility functions put into single files. This way, new projects needed only to copy configure, compats.c, and tests.c to use the shim.

At the time and for some time after, the extra Makefile.configure file didn't do much, but compats.c and tests.c, and the logic of configure, grew significantly.

Prior to the porting effort, configure had a bit under three dozen feature tests, going so far as to provide the queue(3) and tree(3) macro sets.

progress

One of the biggest problems with portability when I started this adventure was the location of dependencies. Most of the tools used -lexpat, -lz, and -lsqlite3. However, -lsqlite3 sometimes needed -lpthread, sometimes not. More importantly, some portable functions existed on target platforms but required special libraries. For example, md5(3) exists on FreeBSD, but requires -lmd for linking. Then on IllumOS, all socket functions required -lsocket -lnsl.

To date, it was the job of the porter to know which libraries and library paths were required and pass them to the script. On these many new systems, I was the porter, so this got old real fast.

    1 % ./configure \
    2 > CFLAGS="-I/usr/local/include" \
    3 > LDFLAGS="-L/usr/local/lib" \
    4 > LDADD="-lsqlite3 -lz -lm -lpthread"
    5 % make

Porting required knowing the correct paths for all systems. This invocation was required for statically compiling on OpenBSD. Obviously this put a huge burden on the porter!

The first big change was to allow for the portability layer to stand on its own and not require porter intervention. To wit, I taught configure to test and export whether platform-specific libraries were required for functions in the portability shim. For example, FreeBSD configuration set LDADD_MD5 in the generated Makefile.configure to the required library. Other systems left this empty. Thus, a Makefile including this would no longer need a porter to pass the library during configuration—it added $(LDADD_MD5) where required.

The second big change was to take advantage of a utility in OpenBSD's base specifically designed for the use case of locating libraries: pkg-config(1). By using this, I no longer needed to supply the include and library paths for target systems: I would use pkg-config(1) to do it for me.

    1 LIBS_EXPAT != pkg-config --libs expat 2>/dev/null || echo "-lexpat"
    2 CFLAGS_EXPAT != pkg-config --cflags expat 2>/dev/null || echo ""
    3 CFLAGS += $(CFLAGS_EXPAT)
    4 LDADD += $(LIBS_EXPAT)

I no longer needed to anticipate the library locations on all platforms. pkg-config(1) is supported on most systems, and on others may be tackled as-needed.

Once these basic steps were accomplished, I no longer needed to pass system-specific flags each time I moved to another system. I was able to focus on the nitty-gritty of providing compatibility functions, macros, and handling diverse header layouts.

nitty-gritty

After the configure script was smart enough to plumb its environment, next came handling discrepencies between systems. This section covers some of the major issues encountered, and isn't specific to the new systems involved in this porting effort.

This isn't a name and shame section. Working in a diverse environment is what it is. (On some systems, however, there can be… questionable design choices.)

strings

This was one of the first components managed by the script. I've read many arguments of why folks don't like strlcpy(3) and friends, but I also haven't read much code coming from those people. Anyway, I won't say anything more about this.

It's fairly simple to test whether string functions exist: though some systems require macro soup to enable these functions during compilation. On glibc systems, for example, it's often necessary to include _GNU_SOURCE.

memory management

OpenBSD's better memory handling functions (e.g., reallocarray(3), explicit_bzero(3), etc.) are slowly making their way into other systems. There seems to be less contention over these than the string functions.

Unfortunately, including these extensions requires some macro soup. On Linux machines, either _GNU_SOURCE or _DEFAULT_SOURCE must be defined (for older and newer systems, respectively). However, if _XOPEN_SOURCE is also defined, such as for endian functions, then this macro conflicts with _DEFAULT_SOURCE. The solution is for both _GNU_SOURCE and _DEFAULT_SOURCE are required. What a pain…

randomness

As it currently stands, only OpenBSD supports high-quality non-blocking random numbers with the arc4random(3) family. Users of other systems must either depend on low-quality numbers or system-specific measures.

endianness

While the traditional ntohs(3) are fairly standard, the new le32toh(3) style is more readable and handles more cases. Unfortunately, this type of interface is very diverse and requires both macro soup and compatibility.

Mac OS X (Darwin) has its set of byte-swapping functions in libkern/OSByteOrder.h. SunOS (Solaris, IllumOS) has its own in sys/byteorder.h. Neither are documented. FreeBSD has the correct functions in the wrong place. All of these need to be detected and the proper macros set up for sane names.

What's more is that, while the byte-swapping functions themselves exist, actually testing for endianness is an entirely different problem. While OpenBSD provides BYTE_ORDER and a simple test for little or big endian, other systems have their own versions of _BYTE_ORDER and so on. Fortunately, most modern compilers emit a __BYTE_ORDER__ macro that can be used as a relatively-safe fall-back without relying on obscure system headers.

devices

The familiar minor(3) and related functions are all over the place on different systems. Fortunately, the functions themselves are named in the same way, so it's simply a matter of finding the correct header file.

file system

The POSIX *at functions (e.g., mknodat(2)) are still not broadly supported. I anticipate this will get better, though. (Only recently did OpenBSD gain most of these!) Though these functions are standardised, or becoming standardised, they do require some macro soup for systems (such as SunOS) that require specific features be defined for usage.

Unfortunately, these functions can't be portably emulated since pairing fchdir(2) and the target function, e.g. mknod(2), has a race between the two.

restricted operation

To any actual programmer, OpenBSD's restricted operation functions pledge(2) and unveil(2) are gifts from heaven. No other system's facilities even come close in terms of practical security.

The user-level restriction features, such as setresgid(2), seem to be slowly migrating. On earlier Mac OS X machines, these were entirely broken, which required additional plumbing to detect since the functions were there. Recent versions do not have this issue, so it remains simply a matter of portability.

hashes

OpenBSD has the simple md5(3) and sha2(3) header files for MD5 and SHA2 (e.g., SHA256) hashing. This is incredibly useful because one can use these powerful functions without needing to pull in external libraries such as OpenSSL or LibreSSL.

FreeBSD has the MD5 header but splits the SHA2 header into SHA256, etc. NetBSD has a single header for both but different variable types and slightly different function naming. (FreeBSD also requires linking to another library for its hash functions.) SunOS, which also needs the hash function library, almost has everything, but is missing several key functions (e.g., SHA256File).

There's no right way to do this—all using the same type, or different types, or different header files—but the disparity causes big headaches for programmers. In the end, it became easier to simply test for all functions and provide compatibility straight-up instead of testing for each variant and providing macro-soup to work between them.

passwords

OpenBSD has crypt_checkpass(3), which makes password hash generation and checking super easy. The fallback is the traditional crypt(3) interface, which is a nightmare. Unfortunately, this function requires a tremendous amount of macro goop to properly use. Linux requires _DEFAULT_SOURCE (actually _XOPEN_SOURCE but defining both _GNU_SOURCE and _XOPEN_SOURCE pull this in, while defining both _XOPEN_SOURCE and _GNU_SOURCE on newer glibcs will cause warnings). Many systems require a further -lcrypt, which is easy to test for.

Linux (glibc) further notes that this function may be deprecated and hand-waves a replacement. It does not have a manpage on some systems, but the function still exists.

The supported hashes is where it gets interesting. OpenBSD, FreeBSD, and newer Linux support Blowfish. NetBSD and Solaris do not. IllumOS does (undocumented). If this weren't confusing enough, NetBSD's function behaves differently than the others: if it does not find the requested hash algorithm, it returns a magic string instead of NULL. It's a mess. The most portable is simply to use DES encryption.

keeping it together…

Finding the necessary macro incantations proved immediately problematic. Especially on Linux: adding a feature test on an older glibc would break on newer ones. This seemed to go on forever and required each change to be tested manually on many systems.

In the next section, I introduce BSD.lv's continuous integration that helped with this.