How to manage RAID 10 on copr-backend¶
There are currently six AWS EBS sc1 volumes used for hosting Copr Backend build
results. Four disks are forming one 24T raid10
, two more disk form 16T
raid1
. These two arrays are used as “physical volumes” for the
copr-backend-data
LVM volume group, and we have a single logical volume on
it with the same name copr-backend-data
(ext4
formatted, mounted as
/var/lib/copr/public_html
).
Everything is configured so the machine starts on its own and mounts everything
correctly. We just need to take a look at /proc/mdstat
from time to time.
Manually checking/stopping checks¶
Commands needed:
echo idle > /sys/block/md127/md/sync_action
echo check > /sys/block/md127/md/sync_action
Detaching volume¶
It’s not safe to just force detach the volume in AWS EC2, it could cause data corruption. Since there are several layers (volumes -> raid -> LVM -> ext4) we need to go the vice versa while detaching.
stop apache, copr-backend, cron jobs, etc.
unmount:
umount /var/lib/copr/public_html
disable volume group:
vgchange -a n copr-backend-data
stop raids:
mdadm --stop /dev/md127
now you can detach the volumes from the instance in ec2
Attaching volume¶
attach the volumes in AWS EC2
start raid and volume group
mdadm --assemble --scan
. In case the--assemble --scan
doesn’t reconstruct the array, it is OK to add the volumes manuallymdadm /dev/md127 --add /dev/nvme2n1p1
.mount the
/dev/disk/by-label/copr-repo
volume
There’s a ansible configuration for this, and list of volumes.
Adding more space¶
Create two
gp3
volumes in EC2 of the same size and type, tag them withFedoraGroup: copr
,CoprInstance: production
,CoprPurpose: infrastructure
. Attach them to a freshly started temporary instance (we don’t want to overload I/O with the initial RAID sync on production backend). Make sure the instance type has enough EBS throughput to perform the initial sync quickly enough.Always partition the disks with a single partition on them, otherwise kernel might have troubles to auto-assemble the disk arrays:
cfdisk /dev/nvmeXn1 cfdisk /dev/nvmeYn1
Create the
raid1
array on both the new partitions:$ mdadm --create --name=raid-be-03 --verbose /dev/mdXYZ --level=1 --raid-devices=2 /dev/nvmeXn1p1 /dev/nvmeYn1p1
Wait till the new empty array is synchronized (may take hours or days, note we sync 2x16T). Check the details with
mdadm -Db /dev/md/raid-be-03
. See the tips bellow how to make the sync speed unlimited withsysctl
.Note
In case the disk is marked “readonly”, you might need the
mdadm --readwrite /dev/md/raid-be-03
command.Place the new
raid1
array into the volume group as a new physical volume (vgextend does pvcreate automatically):$ vgextend copr-backend-data /dev/md/raid-be-03
Extend the logical volume to span all the free space:
$ lvextend -l +100%FREE /dev/copr-backend-data/copr-backend-data
Resize the underlying
ext4
filesystem (takes 15 minutes and more!):$ resize2fs /dev/copr-backend-data/copr-backend-data
Switch the volume types from
gp3
tosc1
, we don’t need the power ofgp3
for backend purposes.Modify the https://github.com/fedora-copr/ansible-fedora-copr group vars referencing the set(s) of volume IDs.
Other tips¶
Note the sysctl dev.raid.speed_limit_max
(in KB/s), this might affect
(limit) the initial sync speed, periodic raid checks and potentially the raid
re-build.
While trying to do a fast rsync, we experimented with a very large instance type
(c5d.18xlarge, 144GB RAM) and with vm.vfs_cache_pressure=2, to keep as many
inodes and dentries in kernel caches (see slabtop
, we eventually had 60M of
inodes cached, 28M inodes and 15T synced in 6.5hours). We had also decreased
the dirty_ratio
and dirty_background_ratio
to have more frequent syncs
considering the large RAM.