How to manage RAID 10 on copr-backend¶
There are currently four AWS EBS sc1 volumes (4x12T, 144MB/s per volume) forming
a RAID 10 array. On top of this is a LVM volume group named
copr-backend-data (24T, and we can add more space in the future).
Everything is configured so the machine should start and mount everything
correctly. We just can keep monitoring
/proc/mdstat from time to time.
Manually checking/stopping checks¶
echo idle > /sys/block/md127/md/sync_action echo check > /sys/block/md127/md/sync_action
It’s not safe to just force detach the volume in AWS EC2, it could cause data corruption. Since there are several layers (volumes -> raid -> LVM -> ext4) we need to go the vice versa while detaching.
disable volume group:
vgchange -a n copr-backend-data
stop the raid:
mdadm --stop /dev/md127
now you can detach in ec2
attach the volumes in AWS EC2
start raid and volume group
mdadm --assemble --scan. In case the
--assemble --scandoesn’t reconstruct the array, it is OK to add the volumes manually
mdadm /dev/md127 --add /dev/nvme2n1p1.
There’s a ansible configuration for this, and list of volumes.
Note the sysctl
dev.raid.speed_limit_max (in KB/s), this might affect
(limit) the initial sync speed, periodic raid checks and potentially the raid
While trying to do a fast rsync, we experimented with a very large instance type
(c5d.18xlarge, 144GB RAM) and with vm.vfs_cache_pressure=2, to keep as many
inodes and dentries in kernel caches (see
slabtop, we eventually had 60M of
inodes cached, 28M inodes and 15T synced in 6.5hours). We had also decreased
dirty_background_ratio to have more frequent syncs
considering the large RAM.