LordAro

Charles Pigott. Devops type.
Commonly found shouting at technology.
About

Locating libc

11 October, 2018

One of the advantages to most Linux distributions is how all the libraries and programs are installed to common locations in the filesystem. This means that executable programs go in /bin, libraries in /lib, user's home directories under /home, and various others. This is the Filesystem Hierarchy Standard and is followed by most Linux distributions.

However, there's still a significant amount of variation in the FHS about where things go across different Linux distributions. /lib, usr/lib and /usr/local/lib are the common directories. Moreover, when you introduce different architectures (e.g. 32bit programs on a 64bit OS), you get things like /lib32, /lib and /lib64. Obviously, which architecture goes in /lib depends on the distribution. Ubuntu also has /lib/i386-linux-gnu and /lib/x86_64-linux-gnu for certain libraries, and there's also /usr/libx32, which I've no idea what its use is.

All this variation makes working out where the libraries are located a bit tricky to achieve in a (reasonable) cross-platform manner.

In this particular case, we're looking for the core C runtime library, libc.so. Given nearly every compiled application on the computer will need to make a system call at some point (open a file, get a random number, etc), nearly every (dynamically linked) application depends on this library. And typically enough, different distributions change the directory that it's stored in. To use two distributions as an example, CentOS puts it under /usr/lib(64), Ubuntu puts it under /usr/lib/x86_64-linux-gnu ...or /usr/libx32.

So, how to find where our most important library is located in the filesystem?

Search path

First thing you might think of is to just look in each of the directories of the library search path. The environment variables LIBRARY_PATH and LD_LIBRARY_PATH are no help here - the former is used by the linker when linking static binaries, and the latter isn't actually necessarily set anywhere and is more of a supplemental list to the defaults.

Okay, so we'll use ld itself to tell us what the search path is - the output of ldconfig -v will have, among some other output, every directory it searches for libraries. The issue here is that the search path isn't in any sort of deterministic order and also includes the search path for different architectures, so you're just as likely to pick up the 32bit libc.so instead of the 64bit version. While we could test the architecture of each libc.so file we find in the resulting directory list, it's also fairly expensive computationally to call out to ldconfig, especially in C, and also to then parse its output.

There must be a better way.

dlopen

dlopen is a nice function call that allows a program to load libraries when they're required rather automatically when the program starts. This can be useful for truly dynamic plugin systems or where architecture specific libraries are available. Usefully, the handle dlopen returns can be queried for file information using dlinfo ...so long as it can actually load the library - libc.so is often written as a "GNU LD script", for the compile time linker, which is just text file pointing to the actual libc.so binary, so cannot be loaded by dlopen.

This is still the solution that some languages use. Amazingly, they use dlopen to try to open libc.so, parse the error message that's returned by dlopen to determine that it's a linker script, then parse the file itself to find the actual location of the actual libc library. This seems horrible, but Ruby FFI and GHC both use this method, so maybe it's fine?

There must be a better way.

gnu/lib-names.h

There is a final option for getting the actual filename of the actual libc library file, though it's not properly portable. GNU provides a header lib-names.h which contains constants (in the form of #defines) for various system libraries, such as libm.so, librt.so, libc.so, and the loader itself, ld.so. It's not exactly POSIX friendly, but there we go. Using this we can get the filename of the library - we're still lacking the directory, but now that we know the filename, it's not infeasible to search through all the library directories for the correct file.

So there we have it! I eventually went with the dlopen method, as the TCC library, which was what all this was for, has the ability to parse the linker scripts anyway.