latest post

An App By Any Other Name

by Gankra, 23 July 2024 | Permalink | RSS

dark 🌙

Names are hard, so changing them should be easy!

Unfortunately backwards compatibility is a bitch, and just renaming something blindly has... Consequences.

It's definitely a coincidence that when I changed my own name, the first initial stayed the same. I definitely didn't precisely pick a new name that would keep my email address still correct. I'm also definitely not about to change my name in an actually breaking way that is going to make my life hell.

God who am I kidding I once announced a step in my transition to work with an email titled

[nightly breaking change] pronouns he/she/they -> she/they

Anyway hi my name is Aria Desires now. Let's talk about how renaming an application is a pain in the neck and how cargo-dist tries to deal with it for you, while I distract myself from the looming email apocalypse.

What Could Go Wrong?

SO MUCH. EVERYTHING.

If you rename your application and your user updates, how likely are they to actually notice that the oldtool is now cooltool? Actually, can they even update your tool anymore?? Are they trapped on a doomed timeline where your app never got its sickrad new name??? SAY IT AIN'T SO!

In general, the more your tool gets used, the more likely it is that there's too many things that assume your application has a particular name. Either you're gonna break something (which is maybe fine!) or you're stuck with your name forever.

My favourite example of "we can never change the binary name" is every game built on the Source Engine being hl2.exe (short for Half Life 2). The funniest reason to not change that is that GPU drivers will actually look at the name of the executable and enable optimizations/fixes for that particular game/engine. Yes, even your ""hardware"" can depend on the name of your app!

But not all is lost, because there is a tried and true solution to this problem: just ship both names! Aliases as far as the eye can see! There's lots of levels where you could want to establish an alias for your application but to keep this article only Kinda Too Long we're going to focus on just on the last mile: having two "copies" of one "executable" with different names.

That is, you want to be able to type both oldtool and cooltool in the terminal and have both work. Easy right? RIGHT?

Well I sure hope it is, because cargo-dist has a bin-aliases setting for exactly this, and it would be really unfortunate if it was an incredibly platform-specific and installer-specific problem. Well, unfortunate for us. You just get to kick back and relax while we do it for you.

Often the solution here is symlinks, but, there's actually 4 different solutions that all have some merits and drawbacks: copies, symlinks, hardlinks, and scripts. All of these options are viable, but they have different strengths and weaknesses, so the "right" one depends on your constraints.

Copies

Just to set a baseline, let's look at the solution we can always fall back to, but everyone will be disappointed in us for using: straight up copy-pasting the binary with different names.

The big merit of this approach is that it has very predictable and portable behaviour. You can always copy a file, and then you have two totally independent files that don't care about eachother.

But this is obviously inefficient, right? Like, there's only one binary, but we're storing (or god forbid, transferring) two copies of it! But that's also something that compression algorithms should be really good at dealing with, and even RAM gets compressed these days!

Alas, compression ain't magic, so there's probably still gonna be a ton of waste if you do this. Also people will generally be annoyed if they notice you doing this. We'll keep this in our pocket, but let's try to avoid doing this.

Everybody on a unixy platform loves a symlink. It's basically a text file saying "paths that point at this actually go to this other path". For our purposes there are two notable properties of a symlink that we'll return to later:

  • It's Just Paths: because it operates on the level of paths, you don't have to worry about there being multiple filesystems or things getting overwritten. You can even symlink dirs, and have relative symlinks! As long as the target path exists, it will continue to resolve. If the target path doesn't, it won't.
  • It's Not A Redirect: if you make a symlink at /whatever/mylink pointing at /wherever/myexe, invoking it via the symlink will tell "myexe" that it's called "mylink"

Unfortunately symlinks have one absolutely fatal flaw: they don't exist on Windows!

Okay, okay, they exist on Windows, but not in a way that's useful to generic tools. The problem is that Microsoft decided that symlinks should require admin to create, so if you're writing anything that's intended to operate without admin, symlinks are going to be a pain in the neck!

But! They made a big announcement that they loved symlinks and were making them easier! Except all they did was add a setting to Windows that allows your account to make symlinks without admin. A setting that requires an admin to enable.

This is still fairly useless to anything that's trying to run on "some random Windows machine". But hey, nice that you can use symlinks a bit easier for your own personal workflows.

So anyway symlinks are great, and indeed that's what we use on unixy platforms! Specifically we use this when we're generating a shell installer or Homebrew installer for your application.

But we'll need another option available if we care about Windows (we always care about Windows (WSL isn't Windows))...

Hardlinks are the lower-level version of a symlink. It's filesystem reference counting! You've essentially copied the file but explicitly told the filesystem "hey these two files should always have the exact same bytes so you can get the best compression wins of your life".

The resulting semantics are an awkward blend of "copies" and "symlinks". People will say "inodes" a bunch and you'll convince yourself you totally understand but then it will wildly blow up on you because suddenly you're forced to grapple with semantic distinctions you otherwise never had to worry about.

If you do operations which "rename" a hardlinked file, the hardlink will be unaffected and both files will still work. This makes the hardlink feel like a second copy of the file. But if you do operations which "modify" the bytes of the file, both files will change. This makes the the hardlink feel like a symlink.

Thankfully, modifying a binary's bytes once you've deployed it to a system is a weird thing to do, so you can mostly just think about it like a copy, right? Right?

So hey how much do you think about whether you're using mv or cp? In a lot of cases these operations are pretty interchangeable, and you'll just type whichever comes to mind first... and unfortunately they do completely different things when the target is a hardlinked file! This is because mv is supposed to "rename" files, while cp is supposed to "modify" bytes!

The distinction between these two commands is in fact really important when building a robust installer, so in some sense this is "business as usual" but sometimes people just manual mess around with binaries and it's certainly Exciting to add this extra chaos roulette.

Oh also if you try to create a hardlink across filesystems it will just fail, because it's a filesystem-local trick! Pretty sure that if you try to copy or move the hardlink across a filesystem something somewhere will get what you mean and just copy the bytes to make a separate file, so that's good, but also another way the hardlink can "randomly" have totally different behaviour based on two directories randomly being on separate drives, or one being a tempfs, or...

But so anyway none of this matters and hardlinks absolutely rule: because they actually exist on Windows!

So of course, this is the solution cargo-dist actually uses on Windows. Specifically when we're generating a PowerShell installer for your application.

For cargo-dist specifically, a lot of the corners of hardlinks also don't actually matter:

  • We want the file and the hardlink to exist in the same directory, and we can just make the hardlink after we copy the file, so we don't care about cross-filesystem shenanigans.
  • We already need to hardlink each file individually, so the fact that you can't hardlink dirs doesn't matter.
  • We already need to care about mv vs cp because we want to be transactional and need to support binaries updating themselves (essentially overwriting them while they're executing).
  • Whether we use a hardlink or symlink, someone randomly poking at the binaries manually is liable to do something confusing so we can't worry about it too much.

Scripts

Ok here's the fun one where we get to talk about several extremely cursed things that are extremely in production for very important tools.

When we say "scripts" here we essentially mean the moral equivalent of making your alias "binary" actually be a shell script that calls the actual binary. At first blush this just seems like a more complicated symlink that's somehow even less portable and more of a maintenance burden, and, it is, BUT it also unlocks the ability to do arbitrary crimes to resolve arbitrary problems.

Today's top 2 script crime enjoyers are:

  • GNU coreutils
  • npm

GNU Coreutils and the True Name Nemesis

Hey remember when we singled out this property of symlinks for later?

It's Not A Redirect: if you make a symlink at /whatever/mylink pointing at /wherever/myexe, invoking it via the symlink will tell "myexe" that it's called "mylink"

It's later.

A surprising amount of tools really like to play the "what's the name of my own binary?" game. Probably the most famous example of this is that clang can have completely different behaviours based on whether it thinks its name is clang, clang++, gcc, or so on. The idea here is that a compiler is a big and complicated thing, and if you implement a C++ compiler you're most of the way to a C compiler anyway, so let's just put them both in the same binary and pretend it's two binaries, switching the binary based on what the user seems to want based on the name.

It's extremely cute, you've gotta love it.

So what does this have to do with GNU coreutils? Actually what even is GNU coreutils?

Well, it's an implementation of all the basic unixy commands that are so ubiquitous that they feel like they're just builtin to the system: cp, rm, ls, chmod, ln, and many more. In true unixy fashion, these are all separate binaries that do just their "one thing".

Except, hey, all of these commands need to handle all the low-level details of I/O and error handling and CLI parsing and... huh there's probably a lot of common code between them, isn't there? So you probably want them to share a codebase, but once you compile them all out into separate binaries, there's going to be a ton of code that's duplicated into every one!

It's the same problem that clang has, but instead of a few really big binaries, it's 50 really small binaries. And indeed, the GNU coreutils has the same solution, just checkout their --enable-single-binary build flag!

Actually hmm wait what does it say?

The install behavior is determined by the option --enable-single-binary=symlinks or --enable-single-binary=shebangs (the default). With the symlinks option, you can't make a second symlink to any program because that will change the name of the called program, which is used by coreutils to determine the desired program. The shebangs option doesn't suffer from this problem

Wait so they support symlinks, but it's buggy and not the default?

See the problem here is that if you're relying on symlinks to secretly make one binary behave like two different binaries, the whole thing is a house of cards that is waiting to topple over if someone completely innocently symlinks your symlinks! As in, if someone symlinks ls as hewwo, then suddenly the only information that the program was supposed to behave like ls is completely lost!

This is why the actual default way to do a single-binary GNU coreutils is with scripts. Specifically, instead of creating a symlink for each program, it generates these absolutely beautiful single-line script files that pass in which program we're trying to run:

#!/usr/bin/coreutils --coreutils-prog-shebang=ls

When I first saw this I genuinely wasn't even sure what to call it. It's a pure shebang file, with no actual body! It's not a shell script it's just... a coreutils script? (Apparently the proper name for shebang files is "interpreter scripts", so, yeah coreutils script it is.)

Although everything has exciting tradeoffs, especially when you're coreutils, this certainly makes the whole thing more robust to symlinks! Yay for scripts!

An earlier version of this article almost asserted that rustup worked the same way, but as of this writing it does in fact use bare hardlinks -- if you have rustup installed try out ln -s ~/.cargo/bin/cargo rustc and see what ./rustc --version spits out! 😸

npm and the What Do You Mean You Can't Just Execute A JavaScript

So fun fact: the npm package manager is, completely reasonably, written in JavaScript. This isn't a format that's natively understood by my computer, and they don't do any of the "turn some JS into a standalone portable executable" stuff that everyone loves these days.

Instead, we return to our good old friend the shebang file to make it runnable. Here's what the npm on my PATH actually contains on my Ubuntu install:

#!/usr/bin/env noderequire('../lib/cli.js')(process)

Great! It's just like a shell script, but it's a JS script, because we say to evaluate it with node, the JS interpreter.

But wait, shebang files aren't a thing on Windows, and npm works on Windows too... what's npm there?

$ where.exe npmC:\Users\gankra\AppData\Roaming\npm\npmC:\Users\gankra\AppData\Roaming\npm\npm.cmdC:\Program Files\nodejs\npmC:\Program Files\nodejs\npm.cmd

Oh dear. Let's peak into the first one, which is presumably the main one...

#!/bin/shbasedir=$(dirname "$(echo "$0" | sed -e 's,\\,/,g')")case `uname` in    *CYGWIN*|*MINGW*|*MSYS*)        if command -v cygpath > /dev/null 2>&1; then...

OH DEAR. RIGHT YES. There are like 3 different things that try to make Windows pretend to be Linux. Plus WSL can shove your Windows PATH into a Linux VM.

And you need the cmd version for actual for reals Windows support. Wow uh, proxy scripts are great but I'm suddenly very sympathetic to rustup just using hardlinks and calling it a day. The folks who maintain these npm scripts are absolute heroes.

Wait actually what does npm install -g @axodotdev/axolotlsay do?

$ where.exe axolotlsayC:\Users\ninte\AppData\Roaming\npm\axolotlsayC:\Users\ninte\AppData\Roaming\npm\axolotlsay.cmd

Wow neat, npm will create exactly the same scripts for all your JS CLI apps!

This behaviour keys off of the bin key in a package.json, and, fun fact you can actually use that field to ask npm to make script-based aliases for a command!

{  "bin": {    "oldtool": "bin/oldtool.js",    "cooltool": "bin/oldtool.js"  }}

We are intimately familiar with this behaviour because it's something cargo-dist takes care of for you when you say you want a binary alias and you turn on our support for generating and publishing npm installers.

Ok Let's Call It There

Genuinely this article could go on and on. Remember we just narrowed the problem down to binary names. There's so many other kinds of names to potentially change, like... my... own...

Ah crap I still need to file that paperwork and ruin my email address.

Welp, hopefully this was as distracting to you as it was for me, I gotta go!

Have fun renaming all your binaries???