[CentOS] pvmove speed

Wed Feb 13 07:07:10 UTC 2008
William L. Maltby <CentOS4Bill at triad.rr.com>

On Tue, 2008-02-12 at 22:24 -0700, Joseph L. Casale wrote:
> >But I really have a hunch that it is just a lot of I/O wait time due to
> >either metadata maintenance and checkpointing and/or I/O failures, which
> >have very long timeouts before failure is recognized and *then*
> >alternate block assignment and mapping is done.
> 
> One of the original arrays just needs to be rebuilt with more members, there are no errors but I believe you are right about simple I/O wait time.
> 
> Going from sdd to sde:
> 
> # iostat -d -m -x
> Linux 2.6.18-53.1.6.el5 (host)  02/12/2008
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> sdd               0.74     0.00  1.52 42.72     0.11     1.75    86.41     0.50   11.40   5.75  25.43
> sde               0.00     0.82  0.28  1.04     0.00     0.11   177.52     0.13   98.71  53.55   7.09
> 
> Not very impressive :) Two different SATA II based arrays on an LSI controller, 5% complete in ~7 hours == a week to complete! I ran this command from an ssh session from my workstation (That was clearly a dumb move). Given the robustness of the pvmove command I have gleaned from reading, if the session bales how much time am I likely to lose by restarting? Are the checkpoints frequent?

Beyond my ken on the checkpoint frequency. Never had to use them. I'm in
a situation where I can start 'em up and walk away. My best thought is
to read the description of it in the man page and make a best-guess
about letting it run or not.

Sorry I can't offer more, but I'd being spewing FUD if I tried!

I suggest that with an estimated 1 week completion, you can't lose much
by killing it and restarting. Other checkpoints I've used in the past
have *very* low overhead and easily justify their use.

I would anticipate this to be the same. IIRC from the man page
description, it is essentially just marking completed portions and
updating metadata to reflect the new status. With such a straightforward
process, restart should be almost instantaneous with very low loss of
time.

Again, this is all supposition as I don't know the code.

> 
> Thanks!
> jlc
> <snip sig stuff>

-- 
Bill