System dependencies are hard (so we made them easier)

by Misty De MΓ©o, 25 October 2023 | Permalink | RSS

dark πŸŒ™

In an ideal world, building a Rust package would involve nothing more than cargo build: all your dependencies would come straight from Cargo, nothing would come from the outside world, and it would all Just Work. Of course, the ideal world doesn't exist β€” that's why we built cargo-dist after all.

In short: we made installing system dependencies in CI as easy as installing Cargo dependencies, using a Cargo-like syntax that's easy to write and easy to read. Once you've installed those dependencies, we can tell you what your software used and what your users will need to run it. Here's an example:

[workspace.metadata.dist.dependencies.homebrew]cmake = { targets = ["x86_64-apple-darwin"] }libcue = { version = "2.2.1", targets = ["x86_64-apple-darwin"] }[workspace.metadata.dist.dependencies.apt]cmake = '*'libcue-dev = { version = "2.2.1-2" }[workspace.metadata.dist.dependencies.chocolatey]lftp = '*'cmake = '3.27.6'

Installing dependencies

We've always known users would want the ability to install extra system dependencies for their builds; it's been a widely-requested feature. In fact, in our very first blog post, installing custom dependencies was how we demonstrated that you can make custom edits to our CI scripts. Well, make custom edits no more: the headlining feature of cargo-dist 0.4.0 is the ability to install extra packages using the system package manager. We've started with support for three package managers which are preinstalled in the runners provided by GitHub CI: apt (Linux), Homebrew (macOS), and Chocolatey (Windows). We plan to add support for other package managers in the future.

While we can't abstract away every detail of the package manager, I wanted to settle on a solution that abstracts away the actual invocation so that users only had to focus on packages. I chose a familiar syntax modelled after Cargo's own dependency syntax. Within the metadata's dist.dependencies, you create one section per package manager, then specify dependencies a line per package. Let's take a look at that example again:

[workspace.metadata.dist.dependencies.homebrew]cmake = { targets = ["x86_64-apple-darwin"] }libcue = { version = "2.2.1", targets = ["x86_64-apple-darwin"] }[workspace.metadata.dist.dependencies.apt]cmake = '*'libcue-dev = { version = "2.2.1-2" }[workspace.metadata.dist.dependencies.chocolatey]lftp = '*'cmake = '3.27.6'

While these specifications require the user to know which packages exist in the package managers they wish to use, it doesn't require the user to know how to use them; using essentially Cargo's own syntax, users can let cargo-dist handle the hard work of actually installing them. In package managers that support installing specific versions of packages, we also allow you to specify them. Not all package managers support specifying versions; Homebrew currently ignores these values, though in a later version we may use it to notify you if you receive a version different than you expected.

Finding out what your app uses

There are two ways to include dependencies when producing a binary: static linking, in which a dependency is copied into the final binary, and dynamic linking, where the binary links to an external copy of the dependency and loads from the external location at runtime. Rust apps usually statically link their Cargo dependencies at the time the binary is built, which is why Rust binaries are typically totally self-contained. As Rust developers, it's easy to assume we can always rely on that. It's a beautiful world where we never have to worry about resolving our apps' dependencies at runtime... and it all falls apart the instant we link to any non-Rust dependency dynamically, from the C standard library to libraries written in other languages like C.

But if installing your dependencies can be tricky, determining what your binary links against can be downright difficult. Each platform has its own tools, which work in different ways from each other, and which can't be used across platforms. You could keep machines running all of the OSs your binaries target, remember how to use ldd and otool and dumpbin, and check it manually on every release, but that's surely too much work and far too error prone. cargo-dist, save me!

But why would you want to know what your package links against? Since they're not a part of your binary, any libraries you've dynamically linked against have to be present on the system where the binary's being used in order to run. It's easy to accidentally produce binaries your end users can't run because they rely on libraries they don't have installed.

But really, what it comes down to isn't just libraries β€” it's packages. Libraries come from somewhere, and knowing where your packages come from is key to knowing what your user needs in order to run your software. That's where cargo-dist's output goes from being "ldd but cross-platform" to "ldd but better". Here's a real (abbreviated) example from a tool that links against many Linux libraries:

β”‚ Category           ┆ Libraries                                                  β”‚
β”‚ System             ┆ /lib/x86_64-linux-gnu/ (libdbus-1-3)         β”‚
β”‚                    ┆ /usr/lib/x86_64-linux-gnu/ (libdatrie1)  β”‚
β”‚                    ┆ /lib/x86_64-linux-gnu/ (libc6)                   β”‚
β”‚                    ┆ /lib/x86_64-linux-gnu/ (libgcc-s1)            β”‚
β”‚ Homebrew           ┆                                                            β”‚
β”‚ Public (unmanaged) ┆                                                            β”‚
β”‚ Frameworks         ┆                                                            β”‚
β”‚ Other              ┆                                                            β”‚

Not only do we know the paths and names that the library linked against, but we know exactly which packages they came from, by name. Asking the user "do you have /usr/lib/x86_64-linux-gnu/ installed" is nearly impossible β€” most users aren't going to think about that or be able to answer that. But if you ask "do you have libdatrie1 installed", that's much easier; the user can look it up themselves in their package manager, and if they don't, it gives them an actionable step to get the thing they need to run the binary.

As the above table shows, I also chose to categorize these dependencies into several different groups. For power users, they provide extra information about where exactly a library comes from and how you may think about it.

  • System libraries are libraries provided by the OS itself. On Linux, they're provided by the operating system's package manager, such as apt on Debian-based operating systems. On macOS, these are the libraries baked into the base OS install; since the locations these are installed to are immutable, it's guaranteed that these files always come from the OS base install.
  • Homebrew libraries are libraries installed by the Homebrew package manager on macOS. These are distinct from system libraries since Homebrew is an over-the-top package manager; it doesn't come from the OS, and it installs packages in distinct locations from where the package manager installs them.
  • Public (unmanaged) is a catch-all for any libraries installed in locations the compiler checks by default but that weren't installed by the package manager.
  • Frameworks are a special kind of library available only on macOS. Your app is most likely to ship against frameworks provided by the operating system, which are installed in /System.
  • Other is a catch-all category for anything we couldn't categorize as one of the above.

Of course, that still requires manual action from your user. What if it was all automated for you? If you're targeting Homebrew distribution, we can do just that for you. Using the information from the linkage generator, we not only print what we figured out that you linked to β€” we also generate dependency statements in the package manager installers cargo-dist creates.

For example, here's a package that was built with cmake and libcue as dependencies. During the build, we calculated the following linkage:

β”‚ Category           ┆ Libraries                                            β”‚
β”‚ System             ┆ /usr/lib/libiconv.2.dylib                            β”‚
β”‚                    ┆ /usr/lib/libSystem.B.dylib                           β”‚
β”‚ Homebrew           ┆ /opt/homebrew/opt/libcue/lib/libcue.2.dylib (libcue) β”‚
β”‚ Public (unmanaged) ┆                                                      β”‚
β”‚ Frameworks         ┆                                                      β”‚
β”‚ Other              ┆                                                      β”‚

Based on that, we know that your users will need the Homebrew-provided libcue library at runtime, but not cmake. We then use that information when building your Homebrew installer, and inject the extra depends_on line into it:

class Cue2Ccd < Formula  if Hardware::CPU.type == :arm    url ""    sha256 "aa8dd79628fd009af4e4d4662b77fedef17ec3ea2fb8089a271df80a480310f4"  else    url ""  end  version "0.1.0"  depends_on "libcue"

Tada! You didn't have to tell cargo-dist what dependencies to include in your Homebrew formula; we figured it out for you, and added it to your package definition. Now when your user brew installs your package, they'll get that dependency automatically.

We only support this kind of package manager integration with Homebrew at the moment, but we plan to expand it to other installer types in the future. I'd love to hear from interested users about how they'd like this to work with shell installers and other ecosystems.

Installing packages, the hard stuff

While using the packages we installed via apt and chocolatey is straightforward, Homebrew provides us a few extra challenges. Homebrew's dependencies aren't always available in paths that the compiler searches by default:

  • On Apple Silicon Macs, Homebrew is installed in /opt/homebrew; the compiler has to be instructed to find anything here, even if it's on the user's PATH.
  • Some dependencies are private, which means they aren't linked into Homebrew's normal lib path. This includes dependencies that shadow key software that comes with the OS and versioned packages.

Internally, Homebrew has its own tooling to handle this when building software; its "superenv" build environment injects all of the dependencies for a build and sets numerous environment variables to ensure that builds can locate them. It covers a lot of what we need β€” if only we could use it!

I first turned to brew bundle exec, a command I wrote several years ago to solve a very similar problem for Ruby builds at GitHub. brew bundle is an alternate frontend for Homebrew that provides a Bundler- or Cargo-like method of installing a set of packages defined in a manifest named Brewfile. brew bundle exec is an extension which wraps any command passed to it in Homebrew's own superenv environment, using the packages defined in the Brewfile as the dependency list to use.

The first revision of cargo-dist's Homebrew integration directly wrapped cargo build with brew bundle exec. It seemed to be working well β€” until we hit the rough waters of cross-compiling, at least. Since GitHub doesn't provide non-premium M1 Mac runners, we run all of our aarch64-apple-darwin builds using an x86_64 runner, and I quickly discovered that wrapping cargo build with brew bundle exec provided by an Intel copy of Homebrew injected too many flags and our native code builds failed. Back to the drawing board.

My next attempt took a slightly more nuanced approach: we'd capture the environment produced by brew bundle exec, but then pick and choose which parts of it we want. Effectively, we've produced our own version of superenv that provides just what we need. We set several different things:

  • We seed RUSTFLAGS with -L flags pointing to the locations of every library in every dependency we have. This will be used by the linker to determine how to locate the libraries it uses when building.
  • We add the bin and sbin directories for every dependency to the front of the PATH, ensuring that they'll be found by any Rust build tools and preferred over versions that may happen to come with the OS.
  • We set cmake and pkg-config-related environment variables, ensuring that builds which use cmake and pkg-config to locate dependencies can find them.

Like many things in build tools land, this didn't take many lines of code to actually write, but the process of knowing what exactly to do and how to do it took a long time. I spent this time in the buildsystem mines so that you don't have to.

A deeper dive into linkage

On a technical level, each platform handles linkage in a different way; this gave us a few limits and guidelines on what information we have access to and how cross-platform our implementation could be. I wanted cargo dist build and cargo dist linkage to work on as many platforms as possible, but also provide as much useful information as possible. That gave us a few limitations on how we can fetch that information. In every case, we're parsing the binary that cargo dist build produces to get this information, but the details of how that works and how much information we get differs depending on the platform we're building for.

macOS provided us the simplest resolution because of a helpful detail the implementation of its dynamic linker: all libraries are recorded via the full paths at which they were located at buildtime. This is also the first path the dynamic linker will use to try to locate the library when the binary is run. From our perspective, this is most useful because it unambiguously lets us know exactly which binary the linker chose β€” and, because that path is recorded in the binary itself, we can examine the binary on any platform, not just a Mac, and get the same information. It's also helpful for us since Homebrew's directory structure exposes the name of the package that contains a library; by seeing the filename /opt/homebrew/opt/libcue/lib/libcue.2.dylib, we can tell that the library was a) installed by Homebrew, and b) comes from a package named libcue without needing to actually have access to the machine on which the package was built. We use the mach-object crate to parse binaries to retrieve this information, and don't apply any limitations to which system the command is run on.

Linux is slightly more complex because the dynamic linker records the names of the libraries that were linked against, but not their paths. Examining just the resulting binary, we can know that a binary linked against, say, β€” but not where that library came from. The standard Linux ldd program can give us that information, but in a much more complex way than the macOS equivalent. That gives us the more detailed output we want (/lib/x86_64-linux-gnu/, but with a catch: it's fetching that information by actually executing the dynamic linker and printing what the dynamic linker found at the time you ran ldd β€” not at the time it was built. If we want to ask what were the packages used at the time your program was built, we have the extra constraint of needing to call ldd in as similar an environment to the one in which it was built as we can. It also, naturally, means that we can't check this information on any platform other than Linux.

The state of Windows tooling is similar to Linux, but with one extra twist. Windows dynamic linkage uses library names, without paths, but we were unable to find a convenient ldd-like tool to inspect the state of the dynamic linker and easily print out what libraries a program uses with their paths. (Know of one? Please let me know!) As a result, the current version of the linkage reporter prints the names of the libraries a Windows program was linked against, but not their paths or where they come from. To get this metadata, we use the goblin crate to parse the binary.

Wrapping up

cargo-dist 0.4.0 is available now with all of these new tools. I hope it helps bring you joy and comfort in your development life. Please give it a try, and let us know what you think!