[CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes
Sandro Bonazzola
sbonazzo at redhat.comMon Apr 10 08:08:21 UTC 2017
- Previous message: [CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes
- Next message: [CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Adding Paolo and Miroslav. On Sat, Apr 8, 2017 at 4:49 PM, Richard Landsman - Rimote <richard at rimote.nl > wrote: > Hello, > > I would really appreciate some help/guidance with this problem. First of > all sorry for the long message. I would file a bug, but do not know if it > is my fault, dm-cache, qemu or (probably) a combination of both. And i can > imagine some of you have this setup up and running without problems (or > maybe you think it works, just like i did, but it does not): > > PROBLEM > LVM cache writeback stops working as expected after a while with a > qemu-kvm VM. A 100% working setup would be the holy grail in my opinion... > and the performance of KVM/qemu is great i must say in the beginning. > > DESCRIPTION > > When using software RAID 1 (2x HDD) + software RAID 1 (2xSSD) and create a > cached LV out of them, the VM performs initially great (at least 40.000 > IOPS on 4k rand read/write)! But then after a while (and a lot of random > IO, ca 10 - 20 G) it effectively turns in to a writethrough cache although > there's much space left on the cachedlv. > > > When working as expected on KVM host all writes go to SSDs > > iostat -x -m 2 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sda 0.00 324.50 0.00 22.00 0.00 14.94 > 1390.57 1.90 86.39 0.00 86.39 5.32 11.70 > sdb 0.00 324.50 0.00 22.00 0.00 14.94 > 1390.57 2.03 92.45 0.00 92.45 5.48 12.05 > sdc 0.00 3932.00 0.00 *2191.50* 0.00 *270.07* > 252.39 37.83 17.55 0.00 17.55 0.36 * 78.05* > sdd 0.00 3932.00 0.00 *2197.50 * 0.00 *271.01 * > 252.57 38.96 18.14 0.00 18.14 0.36 *78.95* > > > When not working as expected on KVM host all writes go through the SSD on > to the HDDs (effectively disabling writeback so it becomes a writethrough) > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await r_await w_await svctm %util > sda 0.00 7.00 234.50 *173.50 * 0.92 * 1.95* > 14.38 29.27 71.27 111.89 16.37 2.45 *100.00* > sdb 0.00 3.50 212.00 *177.50 * 0.83 * 1.95* > 14.60 35.58 91.24 143.00 29.42 2.57* 100.10* > sdc 2.50 0.00 566.00 *199.00 * 2.69 0.78 > 9.28 0.08 0.11 0.13 0.04 0.10 *7.70* > sdd 1.50 0.00 76.00 *199.00* 0.65 0.78 > 10.66 0.02 0.07 0.16 0.04 0.07 *1.85* > > > Stuff i've checked/tried: > > - The data in the cached LV has then not exceeded even half of the space, > so this should not happen. It even happens when only 20% of cachedata is > used. > - It seems to be triggerd most of the time when %cpy/sync column of `lvs > -a` is about 30%. But this is not always the case! > - changing the cachepolicy from cleaner to smq, wait (check flush ready > with lvs -a) and then back to smq seems to help *sometimes*! But not > always... > > lvchange --cachepolicy cleaner /dev/mapper/XXX-cachedlv > > lvs -a > > lvchange --cachepolicy smq /dev/mapper/XXX-cachedlv > > - *when mounting the LV inside the host this does not seem to happen!!* > So it looks like a qemu-kvm / dm-cache combination issue. Only difference > is that inside host i do mkfs in stead of LVM inside VM (so could be LVM > inside VM on top of LVM on KVM host problem too? small chance probably > because the first 10 - 20GB it works great!) > > - tried disabling Selinux, upgrading to newest kernels (elrepo ml and lt), > played around with dirty_cache thingeys like proc/sys/vm/dirty_writeback_centisecs > /proc/sys/vm/dirty_expire_centisecs cat /proc/sys/vm/dirty_ratio , and > migration threashold of dmsetup, and other probably non important stuff > like vm.dirty_bytes > > - when in "slow state" the systems kworkers are exessively using IO (10 - > 20 MB per kworker process). This seems to be the writeback process > (CPY%Sync) because the cache wants to flush to HDD. But the strange thing > is that after a good sync (0% left), the disk may become slow again after a > few MBs of data. A reboot sometimes helps. > > - have tried iothreads, virtio-scsi, vcpu driver setting on virtio-scsi > controller, cachesettings, disk shedulers etc. Nothing helped. > > - the new samsung 950 PRO SSDs have HPA enabled (30%!!), i have AMD > FX(tm)-8350, 16G RAM > > It feels like the lvm cache has a threshold (about 20G of data that is > dirty) and that is stops allowing the qemu-kvm process to use writeback > caching (the root uses inside the host seems to not have this limitation). > It starts flushing, but only to a certain point. After a few MBs of data > it is right back in the slow spot again. Only solution is waiting for a > long time (independant of CPY%SYNC) or sometimes change cachepolicy and > force flush. This prevents for me the production use of this system. But > it's so promising, so I hope somebody can help. > > desired state: Doing the FIO test (described in section reproduce) > repeatedly should keep being fast till cachedlv is more or less full. If > resyncing back to disc causes this degradation, it should actually flush it > fully within a reasonable time and give opportunity to write fast again up > to a given threshold. It now seems like a one time use cache that only uses > a fraction of the SSD and is useless/very unstable afterwards. > > REPRODUCE > 1. Install newest CentOS 7 on software RAID 1 HDDs with LVM. Keep a lot of > space for the LVM cache (no /home)! So make the VG as large as possible > during anaconda partitioning. > > 2. once installed and booted in to the system, install qemu-kvm > > yum install -y centos-release-qemu-ev > yum install -y qemu-kvm-ev libvirt bridge-utils net-tools > # disbale ksm (probably not important / needed) > systemctl disable ksm > systemctl disable ksmtuned > > 3. create LVM cache > > #set some variables and create a raid1 array with the two SSDs > > VGBASE= && ssddevice1=/dev/sdX1 && ssddevice2=/dev/sdX1 && > hddraiddevice=/dev/mdXXX && ssdraiddevice=/dev/mdXXX && mdadm --create > --verbose ${ssdraiddevice} --level=mirror --bitmap=none --raid-devices=2 > ${ssddevice1} ${ssddevice2} > > # create PV and extend VG > > pvcreate ${ssdraiddevice} && vgextend ${VGBASE} ${ssdraiddevice} > > # create slow LV on HDDs (use max space left if you want) > > pvdisplay ${hddraiddevice} > lvcreate -lXXXX -n cachedlv ${VGBASE} ${hddraiddevice} > > # create the meta and data: for testing purposes I keep about 20G of the > SSD for a uncached lv. To rule out it is not the SSD. > > lvcreate -l XX -n testssd ${VGBASE} ${ssdraiddevice} > > #The rest can be used as cachedata/metadata. > > pvdisplay ${ssdraiddevice} > # about 1/1000 of the space you have left on the SSD for the meta (minimum > of 4) > lvcreate -l X -n cachemeta ${VGBASE} ${ssdraiddevice} > # the rest can be used as cachedata > lvcreate -l XXX -n cachedata ${VGBASE} ${ssdraiddevice} > > # convert/combine pools so cachedlv is actually cached > > lvconvert --type cache-pool --cachemode writeback --poolmetadata > ${VGBASE}/cachemeta ${VGBASE}/cachedata > > lvconvert --type cache --cachepool ${VGBASE}/cachedata ${VGBASE}/cachedlv > > > # my system now looks like (VG is called cl, default of installer) > [root at localhost ~]# lvs -a > LV VG Attr LSize Pool Origin > [cachedata] cl Cwi---C--- 97.66g > > * [cachedata_cdata] cl Cwi-ao---- > 97.66g * > * [cachedata_cmeta] cl ewi-ao---- 100.00m * > > * cachedlv cl Cwi-aoC--- 1.75t [cachedata] > [cachedlv_corig] * > [cachedlv_corig] cl owi-aoC--- 1.75t > > [lvol0_pmspare] cl ewi------- 100.00m > > root cl -wi-ao---- 46.56g > > swap cl -wi-ao---- 14.96g > > > > * testssd cl -wi-a----- 45.47g *[root at localhost ~]#lsblk > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > sdd 8:48 0 163G 0 disk > └─sdd1 8:49 0 163G 0 part > └─md128 9:128 0 162.9G 0 raid1 > ├─cl-cachedata_cmeta 253:4 0 100M 0 lvm > │ └─cl-cachedlv 253:6 0 1.8T 0 lvm > ├─cl-testssd 253:2 0 45.5G 0 lvm > └─cl-cachedata_cdata 253:3 0 97.7G 0 lvm > └─cl-cachedlv 253:6 0 1.8T 0 lvm > sdb 8:16 0 1.8T 0 disk > ├─sdb2 8:18 0 1.8T 0 part > │ └─md127 9:127 0 1.8T 0 raid1 > │ ├─cl-swap 253:1 0 15G 0 lvm [SWAP] > │ ├─cl-root 253:0 0 46.6G 0 lvm / > │ └─cl-cachedlv_corig 253:5 0 1.8T 0 lvm > │ └─cl-cachedlv 253:6 0 1.8T 0 lvm > └─sdb1 8:17 0 954M 0 part > └─md126 9:126 0 954M 0 raid1 /boot > sdc 8:32 0 163G 0 disk > └─sdc1 8:33 0 163G 0 part > └─md128 9:128 0 162.9G 0 raid1 > ├─cl-cachedata_cmeta 253:4 0 100M 0 lvm > │ └─cl-cachedlv 253:6 0 1.8T 0 lvm > ├─cl-testssd 253:2 0 45.5G 0 lvm > └─cl-cachedata_cdata 253:3 0 97.7G 0 lvm > └─cl-cachedlv 253:6 0 1.8T 0 lvm > sda 8:0 0 1.8T 0 disk > ├─sda2 8:2 0 1.8T 0 part > │ └─md127 9:127 0 1.8T 0 raid1 > │ ├─cl-swap 253:1 0 15G 0 lvm [SWAP] > │ ├─cl-root 253:0 0 46.6G 0 lvm / > │ └─cl-cachedlv_corig 253:5 0 1.8T 0 lvm > │ └─cl-cachedlv 253:6 0 1.8T 0 lvm > └─sda1 8:1 0 954M 0 part > └─md126 9:126 0 954M 0 raid1 /boot > > # now create vm > wget http://ftp.tudelft.nl/centos.org/6/isos/x86_64/CentOS-6.9- > x86_64-minimal.iso -P /home/ > DISK=/dev/mapper/XXXX-cachedlv > > # watch out, my netsetup uses a custom bridge/network in the following > command. Please replace with what you normally use. > virt-install -n CentOS1 -r 12000 --os-variant=centos6.7 --vcpus 7 --disk > path=${DISK},cache=none,bus=virtio --network bridge=pubbr,model=virtio > --cdrom /home/CentOS-6.9-x86_64-minimal.iso --graphics > vnc,port=5998,listen=0.0.0.0 --cpu host > > # now connect with client PC to qemu > virt-viewer --connect=qemu+ssh://root@192.168.0.XXX/system --name CentOS1 > > And install everything on the single vda disc with LVM (i use defaults in > anaconda, but remove the large /home to prevent SSD beeing over used). > > After install and reboot log in to VM and > > yum install epel-release -y && yum install screen fio htop -y > > and then run disk test: > > fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 > --name=test *--filename=test* --bs=4k --iodepth=64 --size=4G > --readwrite=randrw --rwmixread=75 > > then *keep repeating *but *change the filename* attribute so it does not > use the same blocks over and over again. > > In the beginning the performance is great!! Wow, very impressive 150MB/s > 4k random r/w (close to bare metal, about 20% - 30% loss). But after a few > (usually about 4 or 5) runs (always changing the filename, but not > overfilling the FS, it drops to about 10 MBs/sec. > > normal/in the beginning > > read : io=3073.2MB, bw=183085KB/s, *iops=45771* , runt= 17188msec > write: io=1022.1MB, bw=60940KB/s, *iops=15235* , runt= 17188msec > > but then > > read : io=3073.2MB, bw=183085KB/s, *iops=**2904* , runt= 17188msec > write: io=1022.1MB, bw=60940KB/s, *iops=1751* , runt= 17188msec > > or even worse up to the point that it is actually the HDD that is written > to (about 500 iops). > > P.S. when a test is/was slow, that means it is on the HDDs. So even when > fixing the problem (sometimes just by waiting), that specific file will > keep being slow when redoing the test till its promoted to the lvm cache > (takes a lot of reads I think). And once on the SSD it sometimes keeps > beeing fast, although a new testfile will be slow. So I really recommend > changing the testfile all the time when trying to see if a change in speed > has occurred. > > -- > Met vriendelijke groet, > > Richard Landsmanhttp://rimote.nl > > T: +31 (0)50 - 763 04 07 > (ma-vr 9:00 tot 18:00) > > 24/7 bij storingen: > +31 (0)6 - 4388 7949 > @RimoteSaS (Twitter Serviceberichten/security updates) > > > _______________________________________________ > CentOS-virt mailing list > CentOS-virt at centos.org > https://lists.centos.org/mailman/listinfo/centos-virt > > -- SANDRO BONAZZOLA ASSOCIATE MANAGER, SOFTWARE ENGINEERING, EMEA ENG VIRTUALIZATION R&D Red Hat EMEA <https://www.redhat.com/> <https://red.ht/sig> TRIED. TESTED. TRUSTED. <https://redhat.com/trusted> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.centos.org/pipermail/centos-virt/attachments/20170410/c61d4193/attachment-0002.html>
- Previous message: [CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes
- Next message: [CentOS-virt] lvm cache + qemu-kvm stops working after about 20GB of writes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the CentOS-virt mailing list