Shared Memory and Hard Links

We've had support for symlinks for a very long time cause surprise surprise people use symlinks.

That means that we can support loading them when building images which entails transferring from one file system on a different system to another and also creating them within our filesystem, TFS.

What's actually surprising is that out of all the software that people have been using for nanos - no one has ever explicitly asked for hard link support or even knew they needed it until recently and it was a request that was several layers indirect - that is to say - they were asking for support for a libc call that we did kind of support already but not fully.

First off though - what are some of the differences between hard links and symlinks anyways and why do many people rely mostly on symlinks?

Differences Between Hard Links and Symlinks

There are quite a few differences between hard links and symlinks.

  • Hard links point at inodes while symlinks point at files.

    An inode is short for index node which stores a file's metadata and the actual location of the data. So a hard link points directly at that versus a symlink just points at the path name of the target file. It's a subtle difference but important.

  • Hard links can't cross filesystems but symlinks can.

    Even on your local dev env you probably have multiple mount points for various filesystems such as tmpfs. The reason that hard links can't span multiple filesystems is that they point at inodes and those are unique per filesystem. This is probably one of the core reasons why symlinks are heavily used.

  • Hard links can't point at directories but symlinks can.

    I have a handful of symlinks pointing at dropbox folders in my home directory. It has folders that I share with other people and has stuff I don't want to accidently delete either. It's quite ok if I accidently kill that folder though as I know it's actually stored elsewhere. This begs the question of why hard links can't point at folders. You might think it's cause of the prior point but it's actually something different and it's something you probably use every single day. Every directory actually comes with two hard links premade. One is the current directory - '.' and the other one is the parent directory '..'. Both of these are hard links.

    When you create a new directory you can see right before my username the number of hard links on that folder starts with 2 and increases by one when I create a new folder.

    ➜  ~ mkdir testing
    ➜  ~ ls -ld testing
    drwxr-xr-x@ 2 eyberg  staff  64 Dec 12 07:56 testing
    ➜  ~ mkdir -p testing/bob
    ➜  ~ ls -ld testing
    drwxr-xr-x@ 3 eyberg  staff  96 Dec 12 07:56 testing
    ➜  ~ ls -ld testing/bob
    drwxr-xr-x@ 2 eyberg  staff  64 Dec 12 07:56 testing/bob
    

    Keep in mind that your typical linux directory structure is a hierachial tree structure and a hard link could potentially loop back in on itself. That's why hard links can't point at directories.

  • Hard links will still point at data and not be dangling if the file it points at gets deleted.

    Remember how I mentioned earlier that it is ok if I accidently delete a symlink folder in my home directory cause the original content won't be deleted? That's by design but if I delete the underlying folder and it's contents that symlink will remain in a 'dangling' state. Now if we create a hard link to a file and delete the original file the contents of that file are still there since it never really went away. For example:

    ➜  testing echo "hello bob" > testfile
    ➜  ln testfile newfile
    ➜  rm testfile
    ➜  cat newfile
    hello bob
    

The inability to link to directories or different filesystems is a rather large dealbreaker for many people's usecases. For these reasons you might find symlinks to be a lot more prevalently used. However, hard links still have their uses and are in many types of applications including various backup apps. Now we'll also find out why we ended up adding them to nanos.

SEM_OPEN and SHM_OPEN

sem_open(3) and shm_open(3) are different library functions that have different use cases, however both work with shared memory. sem_open creates a named semaphore using shared memory and shm_open can create arbitrary shared memory. sem_init can be used to create a unnamed semaphore and are faster but named semaphores are typically used in IPC as different processes need a method to understand what they are using since one process might create it and later on a different one might open it.

The original user had asked about sem_open, specificially. Keep in mind this is a libc call and is typically used for IPC amongst several processes - something unikernels don't support and explains why we had not really addressed it. There are valid reasons to use semaphores in a unikernel system though, which has access to multiple threads, but is single process. Some people might wonder why not just use a mutex? A mutex is typically used when you have a single resource that you want to protect and it "locks" and "unlocks" access to that resource from the same thread. One use case for a semaphore can be within a consumer/producer pattern where one thread signals to the other when to access it or in the case of counting semaphores where you have multiple instances of a resource that you'd like to provide access to.

Back to sem_open. sem_open uses shared memory internally and basically abstracts the shm_open and memmap into it's own function.

We originally implemented shared memory because of memfd_create through our shmem && tmpfs klibs. So we already had some basic support for shm_open but sem_open was failing cause of the missing link call.

Getting to the meat of the question - why does sem_open use a temporary file instead of just opening a file? sem_open opens a temporary file and then links to it for a few reasons. It can avoid a race between multiple calls to it because sem_open will either create a new file or open an existing one. Second it provides defense against an attacker that might know the name of the file being created. Link will fail if the link already exists. Finally the semaphore will exist even after the process that created it finishes and until sem_unlink is called so it's a clean method to create a file under these circumstances.

What are the takeaways here?

First - it's really interesting that we have been around for years with users and customers running a very wide range of workloads and only now is someone finding the need for an otherwise perfectly ordinary certain syscall we didn't support.

Second off there are typically multiple ways to skin a cat and it's extremely common for us to have to peel apart several layers of an onion to figure out what actually needs to be done and why something should be supported.

Nanos is constantly evolving and is entirely driven by our customers and users. Perhaps we'll write about something you find next time.

Deploy Your First Open Source Unikernel In Seconds

Get Started Now.