How to upgrade builders¶
This article explains how to upgrade the Copr builders images in
AWS (x86_64 and aarch64),
LibVirt/OpenStack (x86_64 and ppc64le),
IBM Cloud (s390x),
This HOWTO is useful for upgrading images to a newer Fedora release, or for just updating all the packages contained within the builder images. This image “refreshing” also significantly speeds up the following VM startup times, and fixes bugs (when a builder virtual machine is started the image, only a limited subset of packages is always automatically updated).
Keep amending this page if you find something not matching reality or expectations.
Requirements¶
ssh access to staging backend box
ssh access to one of our x86_64 and ppc64le hypervisors, e.g.
copr@vmhost-x86-copr01.rdu-cc.fedoraproject.org
andcopr@vmhost-p08-copr01.rdu-cc.fedoraproject.org
ssh access to
batcave01.iad2.fedoraproject.org
, and sudo access therebe in FAS group
aws-copr
, so you can access AWS login link properlyIBM Cloud API token assigned to the Fedora Copr team (see team’s Bitwaarden)
Prepare AWS source images¶
You need to find proper (official) ami-*
Fedora image IDs, bound to
your desired VM location. You can e.g. go to Fedora Cloud Page and search
for AWS
images. There are different buttons for x86_64 and aarch64
architectures. Click the List AWS EC2 region button.
Do not launch any instance, only find an AMI ID
(e.g. ami-0c830793775595d4b
) for our region - we are using
N.Virginia option, aka us-east-1
, but we should move to
us-west-*
soon.
Then ssh to root@copr-be-dev.cloud.fedoraproject.org
, and su - resalloc
,
and execute for x86_64
arch:
$ copr-resalloc-aws-new-x86_64 \
--initial-preparation --create-snapshot-image --debug \
--name copr-builder-image-x86_64 \
--instance-type=c7i.xlarge \
--ami <ami_ID>
...
* Image ID: ami-0ebce709a474af685
...
And then also for aarch64
. Note that we need an additional volume that will
be later inherited by all machines instatiated from the snapshot ami image (so
we don’t need yet another additional volume when starting builders):
$ copr-resalloc-aws-new-aarch64 \
--initial-preparation --create-snapshot-image --debug \
--additional-volume-size 160 \
--name copr-builder-image-aarch64 \
--instance-type=c7g.xlarge \
--ami <ami_ID>
...
* Image ID: ami-0942a35ec3999e00d
...
Continue fixing the scripts/playbooks/fedora till you succeed like that ^^. Repeat the previous steps.
The remaining step is to configure copr_builder_images.aws.{aarch64,x86_64}
options in Ansible git repo, in file inventory/group_vars/copr_dev_aws
and reprovision the copr-be-dev
instance, see Testing.
Prepare libvirt source images¶
We prepare the LibVirt images directly on our hypervisors. We start with the official Fedora images as the “base images”, and just modify them (this is easier for us compared to generating images from scratch).
The Power9 architecture (note: Power9+ is required by Enterprise Linux 9+) is
currently hosted in the OSU Open Source Lab. That is an OpenStack-based
cloud, but no special actions are needed — we are able to share the same
.qcow2
image generate for our other ppc64le
hypervisors. When
uploading the image (see below), the image is as well automatically uploaded to
the OSUOSL OpenStack.
Find source images¶
The first thing you need to figure out is what image should you use and where to get it.
The Cloud Base image for x86_64 can be obtained on Fedora Cloud page. Pick
the one with .qcow2
extension. The ppc64le and aarch64 images can be found
on the Alternate Architectures page. Don’t confuse PPC64LE with PPC64.
If neither that url provides the expected cloud image version (yet), there
should exist at least a “compose” version in Koji compose directory listing,
look for latest-Fedora-Cloud-<VERSION>/compose/Cloud/<ARCH>/images
directory.
Image preparation¶
We can not prepare the images cross-arch, and we need to prepare one image for every supported architecture (on an appropriate hypervisor). So in turn we need to repeat the instructions for each architecture we have hypervisors for (currently x86_64 and ppc64le).
All the hypervisors in Copr build system are appropriately configured, so it doesn’t matter which of the hypervisors is chosen (only the architecture must match).
Our hypervisors have overcommitted RAM and disk space a lot (otherwise it wouldn’t be possible to start so many builders on each hypervisor in parallel). The good thing about that is that we can anytime temporarily spawn one or more VMs for the purpose of generating the builder image.
So let’s try to generate the image from the given official Fedora Cloud image on one of the x86_64 hypervisors:
$ ssh copr@vmhost-x86-copr02.rdu-cc.fedoraproject.org
[copr@vmhost-x86-copr02 ~][PROD]$ copr-image https://download.fedoraproject.org/pub/fedora/linux/releases/34/Cloud/x86_64/images/Fedora-Cloud-Base-34-1.2.x86_64.qcow2
... SNIP ...
++ date -I
+ qemu-img convert -f qcow2 /tmp/wip-image-hi1jK.qcow2 -c -O qcow2 -o compat=0.10 /tmp/copr-eimg-G6yZpG/eimg-fixed-2021-05-24.qcow2
+ cleanup
+ rm -rf /tmp/wip-image-hi1jK.qcow2
This long running task (several minutes) can fail. If so, please fix the script, and re-run. Once the script finishes correctly (see above the output, and final eimg-fixed*.qcow file), upload the image to all hypervisors:
[copr@vmhost-x86-copr02 ~][PROD]$ /home/copr/provision/upload-qcow2-images /tmp/copr-eimg-G6yZpG/eimg-fixed-2021-05-24.qcow2
... SNIP ...
uploaded images copr-builder-20210524_085845
Test that the image spawns correctly:
$ ssh root@copr-be-dev.cloud.fedoraproject.org
Last login: Fri Jun 14 12:16:48 2019 from 77.92.220.242
# use a different spawning image for hypervisors, set the "VOLUMES.x86_64"
# to 'copr-builder-20210524_085845'".
[root@copr-be-dev ~][STG]# vim /var/lib/resallocserver/provision/libvirt-new
# use a different image for the OSUOSL OpenStack. Set the
# `resalloc-openstack-new --image` argument to
# 'copr-builder-20210524_085845'.
[root@copr-be-dev ~][STG]# vim /var/lib/resallocserver/resalloc_provision/osuosl-vm
# delete current VMs to start spawning new ones
[root@copr-be-dev ~][STG]# su - resalloc
Last login: Fri Jun 14 12:43:16 UTC 2019 on pts/0
[resalloc@copr-be-dev ~][STG]$ resalloc-maint resource-delete --all
# wait a minute or so for the new VMs
[resalloc@copr-be-dev ~][STG]$ resalloc-maint resource-list |grep copr_hv_ |grep STARTING
30784 - copr_hv_x86_64_02_dev_00030784_20210524_090406 pool=copr_hv_x86_64_02_dev tags= status=STARTING releases=0 ticket=NULL
[resalloc@copr-be-dev ~][STG]$ tail -f /var/log/resallocserver/hooks/030784_alloc
... SNIP ...
DEBUG:root:Cleaning up ...
2620:52:3:1:dead:beef:cafe:c141
DEBUG:root:cleanup 50_shut_down_vm_destroy
... SNIP ...
If the log doesn’t look good, you’ll have to start over again (perhaps fix
spawner playbooks, or the copr-image
script). But if you see the VM IP
address (can be an IPv6 one), you are mostly done:
[resalloc@copr-be-dev ~][STG]$ resalloc-maint resource-list | grep 00145
145 - aarch64_01_dev_00000145_20190614_124441 pool=aarch64_01_dev tags=aarch64 status=UP
For copr_builder_images.osuosl.ppc64le
we will use the same buidler image as
for hypervisor ppc64le.
Prepare the IBM Cloud images¶
For IBM Cloud we prepare a qcow2
, s390x
image. This is very similar to
the LibVirt case above — notable
difference is that we don’t have a native hypervisor to run the scripts on.
Fortunately, the Z Architecture virtual machines we start in IBM Cloud give us a possibility to run the scripting directly on the VMs (nested virt support). So we use Copr Backend machine as a hop-box — to work on one of our builder machines:
$ ssh root@copr-be-dev.cloud.fedoraproject.org
# su - resalloc
$ copr-prepare-s390x-image-builder
... takes one s390x builder ...
... installs additional packages ...
... does some preparation, and says ...
Now you can start the work on the machine:
$ ssh root@165.192.137.98
...
So we can switch to the builder machine:
$ ssh root@165.192.137.98
Now, find a qcow2
image we’ll be updating, take a look at the
Alternate Architectures page. At this moment you want the s390x
Architecture category, and Fedora Cloud qcow2. Being on the remote VM,
start with:
$ copr-image https://download.fedoraproject.org/pub/fedora-secondary/releases/35/Cloud/s390x/images/Fedora-Cloud-Base-35-1.2.s390x.qcow2
...
+ qemu-img convert -f qcow2 /tmp/wip-image-HkgkS.qcow2 -c -O qcow2 -o compat=0.10 /tmp/root-eimg-BlS5FJ/eimg-fixed-2022-01-19.qcow2
...
From the output you see the generated image eimg-fixed-2022-01-19.qcow2
—
that needs to be uploaded to IBM Cloud now, under our community account.
Unfortunately, we can not _easily_ do this from Fedora machine directly as
ibmcloud tool is not FLOSS. That’s why we have prepared container image for
uploading, pushed to quay.io service as
quay.io/praiskup/ibmcloud-cli
:
$ qcow_image=/tmp/root-eimg-BlS5FJ/eimg-fixed-2022-01-19.qcow2
$ podman_image=quay.io/praiskup/ibmcloud-cli
$ export IBMCLOUD_API_KEY=.... # find in Bitwarden
$ podman run -e IBMCLOUD_API_KEY --rm -ti -v $qcow_image:/image.qcow2:z $podman_image upload-image
....
+ ibmcloud login -r jp-tok
....
Uploaded image "r022-8509865b-0347-4a00-bbfe-bb6df1c5a384"
("copr-builder-image-s390x-20220119-142944")
Note the image ID somewhere, will be used in Ansible inventory, as
copr_builder_images.ibm_cloud.s390x
value. You can test that the new image
starts well on copr-be-dev
, by:
# su - resalloc
$ RESALLOC_NAME=copr_ic_s390x_us_east_dev \
/var/lib/resallocserver/resalloc_provision/ibm-cloud-vm \
create test-machine
… but note that the first start takes some time, till the image is properly populated! So if the script timeouts on ssh, please re-try.
When prepared, don’t forget to drop the VM we used for the image preparation:
$ resalloc ticket-close <your_id>
Testing¶
If the images for all supported architectures are updated (according to previous
sections), the staging copr instance is basically ready for testing. Update
the Ansible git repo for all the changes in playbooks above, and also update
the copr_builder_images
option in inventory/group_vars/copr_dev_aws
so
it points to correct image names. and increment the
copr_builder_fedora_version
number. Once the changes are pushed upstream,
you should re-provision the backend configuration from batcave:
$ ssh batcave01.iad2.fedoraproject.org
$ sudo rbac-playbook \
-l copr-be-dev.aws.fedoraproject.org groups/copr-backend.yml \
-t provision_config
You might well want to stop here for now, and try to test for a week or so that the devel instance behaves sanely. If not, consider running Running Sanity tests against local dev instance (or at least try to build several packages there).
You can try to kill all the old currently unused builders, and check the spawner log what is happening:
[copr@copr-be-dev ~][STG]$ resalloc-maint resource-delete --unused
Production¶
There is a substantially less work for production instance. You just need to
equivalently update the production configuration file
./inventory/group_vars/copr_aws
, so the copr_builder_images
config
points to the same image names as development instance does. And re-run
playbook from batcave:
$ sudo rbac-playbook \
-l copr-be.aws.fedoraproject.org groups/copr-backend.yml \
-t provision_config
Optionally, when you need to propagate the new images quickly, you can terminate the old but currently unused builders by:
$ su - resalloc
$ resalloc-maint resource-delete --unused
Cleanup¶
When everything is up and running the new version, do not forget to delete the old AMIs and associated snapshots from AWS.