Getting data from koji

When Pungi is configured to get packages from a Koji tag, it somehow needs to access the actual RPM files.

Historically, this required the storage used by Koji to be directly available on the host where Pungi was running. This was usually achieved by using NFS for the Koji volume, and mounting it on the compose host.

The compose could be created directly on the same volume. In such case the packages would be hardlinked, significantly reducing space consumption.

The compose could also be created on a different storage, in which case the packages would either need to be copied over or symlinked. Using symlinks requires that anything that accesses the compose (e.g. a download server) would also need to mount the Koji volume in the same location.

There is also a risk with symlinks that the package in Koji can change (due to being resigned for example), which would invalidate composes linking to it.

Using Koji without direct mount

It is possible now to run a compose from a Koji tag without direct access to Koji storage.

Pungi can download the packages over HTTP protocol, store them in a local cache, and consume them from there.

The local cache has similar structure to what is on the Koji volume.

When Pungi needs some package, it has a path on Koji volume. It will replace the topdir with the cache location. If such file exists, it will be used. If it doesn’t exist, it will be downloaded from Koji (by replacing the topdir with topurl).

Koji path                            /mnt/koji/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
Koji URL    https://kojipkgs.fedoraproject.org/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm
Local path                  /mnt/compose/cache/packages/foo/1/1.fc38/data/signed/abcdef/noarch/foo-1-1.fc38.noarch.rpm

The packages can be hardlinked from this cache directory.

Cleanup

While the approach above allows each RPM to be downloaded only once, it will eventually result in the Koji volume being mirrored locally. Most of the packages will however no longer be needed.

There is a script pungi-cache-cleanup that can help with that. It can find and remove files from the cache that are no longer needed.

A file is no longer needed if it has a single link (meaning it is only in the cache, not in any compose), and it has mtime older than a given threshold.

It doesn’t make sense to delete files that are hardlinked in an existing compose as it would not save any space anyway.

The mtime check is meant to preserve files that are downloaded but not actually used in a compose, like a subpackage that is not included in any variant. Every time its existence in the local cache is checked, the mtime is updated.

Race conditions?

It should be safe to have multiple compose hosts share the same storage volume for generated composes and local cache.

If a cache file is accessed and it exists, there’s no risk of race condition.

If two composes need the same file at the same time and it is not present yet, one of them will take a lock on it and start downloading. The other will wait until the download is finished.

The lock is only valid for a set amount of time (5 minutes) to avoid issues where the downloading process is killed in a way that blocks it from releasing the lock.

If the file is large and network slow, the limit may not be enough finish downloading. In that case the second process will steal the lock while the first process is still downloading. This will result in the same file being downloaded twice.

When the first process finishes the download, it will put the file into the local cache location. When the second process finishes, it will atomically replace it, but since it’s the same file it will be the same file.

If the first compose already managed to hardlink the file before it gets replaced, there will be two copies of the file present locally.

Integrity checking

There is minimal integrity checking. RPM packages belonging to real builds will be check to match the checksum provided by Koji hub.

There is no checking for scratch builds or any images.