How to upgrade persistent instances (OpenStack)

Warning

This document is specific to OpenStack and is outdated. For Amazon AWS, see this up-to-date one.

This article describes how to upgrade persistent instances (e.g. copr-fe-dev) to new Fedora version.

Requirements

Find source image

For OpenStack, there is an image registry on OpenStack images dashboard. By default you see only the project images; to see all of them, click on the Public button.

Search for the Fedora-Cloud-Base-* images of the particular Fedora. Please note that if there is a timestamp in the image name suffix than it is a beta version. It is better to use images with numbered minor version.

The goal in this step is just to find an image name.

Update the image in playbooks

Once the new image name is known, make sure it is set in vars/global.yml, e.g.:

fedora30_x86_64: Fedora-Cloud-Base-30-1.2.x86_64

Then edit the host vars for the instance:

vim inventory/host_vars/<instance>.fedorainfracloud.org
# e.g.
vim inventory/host_vars/copr-dist-git-dev.fedorainfracloud.org

And configure it to use the new image:

image: "{{ fedora30_x86_64 }}"

That is all, that needs to be changed in the ansible repository. Commit and push it.

Backup the old instance

This part is done via openstack client on your computer. First, download an RC file for the persistent tenant. Open Fedora Infra OpenStack dashboard, switch to the Access & Security section, then API Access and click on Download OpenStack RC File.

Load the openstack settings:

source ~/Downloads/persistent-openrc.sh

Backup the old instance by renaming it:

openstack server set --name <old_name>_backup "<id>"
# e.g.
openstack server set --name copr-dist-git-dev_backup "85260b5b-7f61-4398-8d05-xxxxxxxxxxxx"

Warning

backend - You have to terminate existing resalloc resources. See Terminate resalloc resources.

Warning

backend - Terminate OpenStack VMs.

Finally, shut down the instance to avoid storage inconsistency and other possible problems:

$ ssh root@<old_name>.fedorainfracloud.org
[root@copr-dist-git-dev ~][STG]# shutdown -h now

Once the instance is halted, detach volume from the old instance:

openstack server remove volume "<instance_id>" "<volume_id>"
# e.g.
openstack server remove volume "52d97d72-5915-45c0-b223-xxxxxxxxxxxx" "9e2b4c55-9ec3-4508-af46-xxxxxxxxxxxx"

Provision new instance from scratch

On batcave01 run playbook to provision the instance. For dev, see

https://docs.pagure.org/copr.copr/how_to_release_copr.html#upgrade-dev-machines

and for production, see

https://docs.pagure.org/copr.copr/how_to_release_copr.html#upgrade-production-machines

Note

Please note that the playbook may be stuck longer than expected while waiting for a new instance to boot. See Initial boot hangs waiting for entropy.

Get it working

The playbook from the previous section will most likely not succeed. At this point, you need to debug and fix the issues from running it. If required, adjust the playbook and re-run it again and again. Most likely you will also need to attach a volume to it in the OpenStack instances dashboard.

Note

frontend - It will most likely be necessary to manualy upgrade the database. See Upgrade the database.

Note

backend - Copr backend requires an outdated version of python3-novaclient. See Downgrade python novaclient.

Terminate the old instance

Once the new instance is successfully provisioned and working as expected, terminate the old backup instance.

Open the OpenStack instances dashboard and switch the current project to persistent and find the instance, that you want to terminate. Make sure, it is the right one! Don’t mistake e.g. production instance with dev. Then look at the Actions column and click More button. In the dropdown menu, there is a button Terminate instance, use it.

Final steps

Don’t forget to announce on fedora devel and copr devel mailing lists and also on #fedora-buildsys that everything should be working again.

Close the infrastructure ticket.

Troubleshooting

Initial boot hangs waiting for entropy

Because of a known infrastructure issue Fedora infrastructure issue #7966 initial boot of an instance in OpenStack hangs and waits for entropy. It seems that it can’t be fixed properly, so we need to work around by going to OpenStack instances dashboard, opening the instance details, switching to the Console tab and typing random characters in it. It resumes the booting process.

Private IP addresses

Most of the communication within Copr stack happens on public interfaces via hostnames with one exception. Communication between backend and keygen is done on a private network behind a firewall through IP addresses that change when spawning a fresh instance.

After updating a copr-keygen (or dev) instance, change its IP address in inventory/group_vars/copr_dev:

keygen_host: "172.XX.XX.XX"

Whereas after updating a copr-backend (or dev) instance change the configuration in inventory/group_vars/copr_keygen (or dev) and update the iptables rules:

custom_rules: [ ... ]

Please note two addresses needs to be updated, both are backend’s.

Run provision playbooks for copr-backend and copr-keygen to propagate the changes to the respective instances.

Terminate resalloc resources

It is easier to close all resalloc tickets otherwise there will be dangling VMs preventing the backend from starting new ones.

Edit the /etc/resallocserver/pools.yaml file and in all section, set:

max: 0

Then delete all current resources:

su - resalloc
resalloc-maint resource-delete --all

Terminate OpenStack VMs

Make sure you terminate all the OpenStack located builders allocated by copr-backend.service:

# systemctl stop copr-backend # ensure that new are not allocated anymore
# su - copr

# drop the builders from DB
$ redis-cli --scan --pattern 'copr:backend:vm_instance:hset::Copr_builder_*' | xargs redis-cli del

# shutdown all the VMs which are not in DB
$ cleanup_vm_nova.py

Downgrade python novaclient

Backend is dependent on python3-novaclient in prehistoric version 3.3.1. This version is no longer supported and the spec file needed to be customized to build and install only python3 package. Also, the epoch has been bumped so it doesn’t get replaced with a newer version. Please install this package from Copr project (even on production instance):

dnf copr enable @copr/novaclient
dnf install python3-novaclient-2:3.3.1

Note

Please do not automatize this step in the playbook, so it forces us to deal with the situation properly.

Upgrade the database

When upgrading to a distribution that provides a new major version of PostgreSQL server, there is a manual intervention required.

Upgrade the database:

[root@copr-fe-dev ~][STG]# dnf install postgresql-upgrade
[root@copr-fe-dev ~][STG]# postgresql-setup --upgrade

And rebuild indexes:

[root@copr-fe-dev ~][STG]# su postgres
bash-5.0$ cd
bash-5.0$ reindexdb --all