Hi,
My Centos 5.2 server (5.1 suffered the same problem as well) has a logical volume on a RAID 10 array (4 SATA harddisks on a Highpoint RR2310 controller). /etc/fstab has an entry for this array as below /dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0
Normally it works OK. But, file system of "this volume" once in a while goes "read only" mode. The RAID software reports no problem with the hard disks. After reboot, the system comes back in normal rw mode.
What could be the reason?. I would appreciate any help/hint.
Thank you, Mufit
Mufit Eribol wrote on Thu, 31 Jul 2008 10:08:48 +0300:
But, file system of "this volume" once in a while goes "read only" mode.
there will be log entries about this. Do a *forced* fsck.
Kai
Kai Schaetzl wrote:
Mufit Eribol wrote on Thu, 31 Jul 2008 10:08:48 +0300:
But, file system of "this volume" once in a while goes "read only" mode.
there will be log entries about this. Do a *forced* fsck.
Kai
Kai, thank you very much for the hint.
I am not sure which other log file has entries about this problem but here is the relevant section of dmesg:
md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. device-mapper: multipath: version 1.0.5 loaded EXT3 FS on md2, internal journal kjournald starting. Commit interval 5 seconds EXT3 FS on md1, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on md0, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3-fs warning (device dm-0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device dm-0): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on dm-0, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. Adding 6144852k swap on /dev/sda3. Priority:1 extents:1 across:6144852k Adding 6144852k swap on /dev/sdb3. Priority:1 extents:1 across:6144852k
What is the best way to run fsck on a production system?
Thank you Mufit
Mufit Eribol wrote on Thu, 31 Jul 2008 13:34:00 +0300:
I am not sure which other log file has entries about this problem
messages will have the warning from when it goes in read-only mode, that's the important one!
EXT3-fs warning (device dm-0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device dm-0): ext3_clear_journal_err: Marking fs in need of filesystem check.
As you see, it *did* find a problem ;-)
EXT3-fs warning: mounting fs with errors, running e2fsck is recommended
and didn't correct it
What is the best way to run fsck on a production system?
You have to unmount the filesystem in question. I think that's even recommended for a "do not repair" run.
Kai
Kai Schaetzl wrote:
You have to unmount the filesystem in question. I think that's even recommended for a "do not repair" run.
Kai
"shutdown -rF now" didn't fix the problem either. There are problems with several inodes. Reboot fixes the problem for a couple of hours.
I am thinking about reformatting this volume, but /var is on that volume as well. Perhaps, I should move /var to somewhere else, then format that volume. I don't know if there is a better/easier way to fix the problem.
I would appreciate any help.
TIA, Mufit
Mufit Eribol wrote:
Kai Schaetzl wrote:
You have to unmount the filesystem in question. I think that's even recommended for a "do not repair" run.
Kai
"shutdown -rF now" didn't fix the problem either.
How should it? You have to *CHECK* the filesystem.
touch /forcefsck
and reboot. This will cause all filesystems to be checked with fsck after the reboot.
I would appreciate any help.
I think Kai asked you several times to check the filesystem which goes readonly.
Ralph
Ralph Angenendt wrote:
touch /forcefsck
and reboot. This will cause all filesystems to be checked with fsck after the reboot.
I did it several times. Unfortunately, it couldn't fix the problem. I still get the following errors and the system goes "read only" after a couple of minutes.
EXT3-fs warning (device dm-0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device dm-0): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on dm-0, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode.
It seem formatting the /mnt/raid is the way to go. However, i have to move /mnt/raid/var to /var first. / is on another hard disk and there is space available. there are lots of programs use var lively. How can I move /mnt/raid/var to /var.
TIA, Mufit
Mufit Eribol wrote:
Ralph Angenendt wrote:
touch /forcefsck
and reboot. This will cause all filesystems to be checked with fsck after the reboot.
I did it several times. Unfortunately, it couldn't fix the problem.
Does it say the fsck is a success or fails?
I still get the following errors and the system goes "read only" after a couple of minutes.
EXT3-fs warning (device dm-0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure EXT3-fs warning (device dm-0): ext3_clear_journal_err: Marking fs in need of filesystem check. EXT3-fs warning: mounting fs with errors, running e2fsck is recommended EXT3 FS on dm-0, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode.
It seem formatting the /mnt/raid is the way to go. However, i have to move /mnt/raid/var to /var first. / is on another hard disk and there is space available. there are lots of programs use var lively. How can I move /mnt/raid/var to /var.
Boot the rescue disk. Mount the partitions someplace. Dump /old_var to /new_var. Of course, if the /old_var fs is somewhat trash, /new_var won't be much better.
I'd be wary of hardware problems with raid controller, cables, or disks. That "IO failure" in your logs isn't what you want to see during fs operations.
Toby Bluhm wrote: . . .
Boot the rescue disk. Mount the partitions someplace. Dump /old_var to /new_var.
Also verify that fstab or symlinks is not going to keep using old_var.
Also Also make sure you have enough space for the new_var location.
Toby Bluhm wrote:
I did it several times. Unfortunately, it couldn't fix the problem.
Does it say the fsck is a success or fails?
How can I get this info? All I get dmesg and messages logs after the boot. Is there a log somewhere? If not, I think I have to watch the monitor on the server during boot.
Boot the rescue disk. Mount the partitions someplace. Dump /old_var to /new_var. Of course, if the /old_var fs is somewhat trash, /new_var won't be much better.
The problem is there is a raid card kernel module loads during boot. If I boot the rescue disk, /mnt/raid will not be mounted.
I'd be wary of hardware problems with raid controller, cables, or disks. That "IO failure" in your logs isn't what you want to see during fs operations.
You are absolutely right. I am just trying to do my best to recover whatever I can.
Thank you. Mufit
Mufit Eribol wrote on Fri, 01 Aug 2008 19:52:09 +0300:
The problem is there is a raid card kernel module loads during boot. If I boot the rescue disk, /mnt/raid will not be mounted.
Ah, right, I read your "device dm-0" as md0 and assumed a software RAID.
Kai
Mufit Eribol wrote on Fri, 01 Aug 2008 18:45:38 +0300:
I did it several times.
You rebooted several times, you did not force a check I think. You have to boot with the rescue CD and then do a thorough fsck on the filesystem. Maybe on each one of the disks separately, don't know. The boot-up check might not be sufficient. There's probably some bad block on the disk that needs to be flagged away. Until this hasn't happened the system will always try to write/read on that sooner or later.
Kai
Kai Schaetzl wrote:
You rebooted several times, you did not force a check I think. You have to boot with the rescue CD and then do a thorough fsck on the filesystem. Maybe on each one of the disks separately, don't know. The boot-up check might not be sufficient. There's probably some bad block on the disk that needs to be flagged away. Until this hasn't happened the system will always try to write/read on that sooner or later.
Kai
I used both "#touch /forcefsck" and "#shutdown -rF now" methods. Don't they force a check? Actually, now it seems to me they don't. There is no sign of starting fsck in messages.
Here is the complete log relevant to disk operations from /var/log/messages
Aug 1 18:29:45 server kernel: rr2310_00:[0 0] Start channel soft reset. Aug 1 18:29:45 server kernel: rr2310_00:[0 1] Start channel soft reset. Aug 1 18:29:45 server kernel: rr2310_00:[0 2] Start channel soft reset. Aug 1 18:29:45 server kernel: rr2310_00:[0 3] Start channel soft reset. Aug 1 18:29:45 server kernel: rr2310_00:channel [0,0] started successfully Aug 1 18:29:45 server kernel: rr2310_00:channel [0,1] started successfully Aug 1 18:29:45 server kernel: rr2310_00:channel [0,2] started successfully Aug 1 18:29:45 server kernel: rr2310_00:channel [0,3] started successfully Aug 1 18:29:45 server kernel: scsi8 : rr2310_00 Aug 1 18:29:45 server kernel: Vendor: HPT Model: DISK_8_0 Rev: 4.00 Aug 1 18:29:45 server kernel: Type: Direct-Access ANSI SCSI revision: 00 Aug 1 18:29:45 server kernel: SCSI device sdc: 1562378240 512-byte hdwr sectors (799938 MB) Aug 1 18:29:45 server kernel: sdc: Write Protect is off Aug 1 18:29:45 server kernel: SCSI device sdc: drive cache: write through Aug 1 18:29:45 server kernel: SCSI device sdc: 1562378240 512-byte hdwr sectors (799938 MB) Aug 1 18:29:45 server kernel: sdc: Write Protect is off Aug 1 18:29:45 server kernel: SCSI device sdc: drive cache: write through Aug 1 18:29:45 server kernel: sdc: unknown partition table Aug 1 18:29:45 server kernel: sd 8:0:0:0: Attached scsi disk sdc Aug 1 18:29:45 server kernel: sd 8:0:0:0: Attached scsi generic sg2 type 0 Aug 1 18:29:45 server kernel: floppy0: no floppy controllers found Aug 1 18:29:45 server kernel: lp: driver loaded but no devices found Aug 1 18:29:45 server kernel: ACPI: Power Button (FF) [PWRF] Aug 1 18:29:45 server kernel: ACPI: Power Button (CM) [PWRB] Aug 1 18:29:45 server kernel: ibm_acpi: ec object not found Aug 1 18:29:45 server kernel: md: Autodetecting RAID arrays. Aug 1 18:29:45 server kernel: md: autorun ... Aug 1 18:29:45 server kernel: md: ... autorun DONE. Aug 1 18:29:45 server kernel: device-mapper: multipath: version 1.0.5 loaded Aug 1 18:29:45 server kernel: EXT3 FS on md2, internal journal Aug 1 18:29:45 server kernel: kjournald starting. Commit interval 5 seconds Aug 1 18:29:45 server kernel: EXT3 FS on md1, internal journal Aug 1 18:29:45 server kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 1 18:29:45 server kernel: kjournald starting. Commit interval 5 seconds Aug 1 18:29:45 server kernel: EXT3 FS on md0, internal journal Aug 1 18:29:45 server kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 1 18:29:45 server kernel: kjournald starting. Commit interval 5 seconds Aug 1 18:29:45 server kernel: EXT3-fs warning (device dm-0): ext3_clear_journal_err: Filesystem error recorded from previous mount: IO failure Aug 1 18:29:45 server kernel: EXT3-fs warning (device dm-0): ext3_clear_journal_err: Marking fs in need of filesystem check. Aug 1 18:29:45 server kernel: EXT3-fs warning: mounting fs with errors, running e2fsck is recommended Aug 1 18:29:45 server kernel: EXT3 FS on dm-0, internal journal Aug 1 18:29:45 server kernel: EXT3-fs: recovery complete. Aug 1 18:29:45 server kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 1 18:29:45 server kernel: Adding 6144852k swap on /dev/sda3. Priority:1 extents:1 across:6144852k Aug 1 18:29:45 server kernel: Adding 6144852k swap on /dev/sdb3. Priority:1 extents:1 across:6144852k
sdc is the problematic partition (LVM2 on RAID10). I think dm-0 is pointing the same device.
If I use a rescue disk loading RAID driver kernel module will not be loaded. Can I load it manually? It is getting complicated for me.
Thank you, Mufit
Your first message says you have the problems on the lv mounted at /mnt/raid.
/dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0
then later
I am thinking about reformatting this volume, but /var is on that volume as well.
If you mean that /var is a separate lv in your raid_vg0 volume group, then just umount /mnt/raid and run your fsck on /dev/raid_vg0/raid_lv0.
If you have services that live in or depend on /mnt/raid being mounted, stop all those services first. Or init 1 to single user console.
On Thu, 2008-07-31 at 10:08 +0300, Mufit Eribol wrote:
Hi,
My Centos 5.2 server (5.1 suffered the same problem as well) has a logical volume on a RAID 10 array (4 SATA harddisks on a Highpoint RR2310 controller). /etc/fstab has an entry for this array as below /dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0
Normally it works OK. But, file system of "this volume" once in a while goes "read only" mode. The RAID software reports no problem with the hard disks. After reboot, the system comes back in normal rw mode.
If it happens again, you may be able to avoid the reboot with
mount -o remount,rw /mnt/raid
As to your "how to check production ...", easy. The trade-off (down time, reboot, ...) makes it easy to decide to knock users down, umount the FS, run the check, remount, tell users they can go again.
<snip>
Mufit
<snip sig stuff>
HTH
William L. Maltby wrote:
On Thu, 2008-07-31 at 10:08 +0300, Mufit Eribol wrote:
Hi,
My Centos 5.2 server (5.1 suffered the same problem as well) has a logical volume on a RAID 10 array (4 SATA harddisks on a Highpoint RR2310 controller). /etc/fstab has an entry for this array as below /dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0
Normally it works OK. But, file system of "this volume" once in a while goes "read only" mode. The RAID software reports no problem with the hard disks. After reboot, the system comes back in normal rw mode.
If it happens again, you may be able to avoid the reboot with
mount -o remount,rw /mnt/raid
As to your "how to check production ...", easy. The trade-off (down time, reboot, ...) makes it easy to decide to knock users down, umount the FS, run the check, remount, tell users they can go again.
William, thank you for the hint. Nevertheless, the command doesn't mount rw. It says
mount: block device /dev/raid_vg0/raid_lv0 is write-protected, mounting read-only
I had to reboot again.
Mufit
I think I found the culprit albeit I still don't know how to fix.
1. During boot the screen prints the following errors "no fstab.sys, mounting internal defaults ... No devices found Setting up Logical Volume Management: /var/lock: mkdir failed: No such file or directory"
I have a LV on RAID mounted as /mnt/raid. Then /mnt/raid/var is symlinked to /var. I found on the internet that some linux systems look for /var/lock or /var/run on / partition only. Obviously LVM can not create its file in /var/lock, perhaps /mnt/raid is not mounted yet during /var/lock mkdir operation.
2. Second important finding is that /forcefsck forces only software raid not the hardware one. It does the check for md0 (/temp), md1 (/boot), and md2 (/). It skips /dev/raid_vg0/raid_lv0 (/mnt/raid) altogether. I don't know how to force to check it durin reboot.
3. I changed to init level 1. Then tried to umount /mnt/raid. But all I received was "device is busy" prompt. "umount -l /mnt/raid" was able unmount /mnt/raid. Then tried to run "fsck /mnt/raid". This time I received "fsck.ext2: Is a directory while trying to open /mnt/raid The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with on alternate superblock:
e2fsck -b 8193 <device>"
I tried with one of the superblocks on /mnt/raid. This time I get "fsck.ext3: Device or resource busy while trying to open /dev/raid_vg0/raid_lv0 Filesystem mounted or opened exclusively by another program?"
Sorry for the long post.
This is the point I arrived. I am stumped.
Thank your for all the support.
Mufit
Mufit Eribol wrote:
I think I found the culprit albeit I still don't know how to fix.
- During boot the screen prints the following errors
"no fstab.sys, mounting internal defaults ... No devices found Setting up Logical Volume Management: /var/lock: mkdir failed: No such file or directory"
I have a LV on RAID mounted as /mnt/raid. Then /mnt/raid/var is symlinked to /var.
I was afraid you were going to say that.
Go back to single user mode.
mkdir /new_var cd /mnt/raid/var tar cf - . | ( cd /new_var ; tar xvf - )
Make sure both dirs look the same.
Change the link to /new_var. Or remove the old link & mv /new_var /var.
reboot.
On Fri, 2008-08-01 at 16:13 -0400, Toby Bluhm wrote:
Mufit Eribol wrote:
<snip>
I see in your other post that you need to do some studying. *After* umount of /mnt/raid, there is *no* device on /mnt/raid anymore. In my original reply, I presumed (shame on me) that you would correctly try to fsck the *device*.
Fsck works on devices or partitions that have been formatted as an e2 or e3 (e2 with journaling) file system.
I am also concerned about the error messages related to your /etc/fstab. There are several things that are normally in there, including locally defined file systems.
I suggest you find/make a "pristine" installation somewhere and compare the fstab to what you have now.
On Fri, Aug 1, 2008 at 1:43 PM, William L. Maltby CentOS4Bill@triad.rr.com wrote:
On Fri, 2008-08-01 at 16:13 -0400, Toby Bluhm wrote:
Mufit Eribol wrote:
<snip>
.....
..... that you would correctly try to
fsck the *device*.
First backup data... It is possible to run "fsck" with a media test flag. Bad blocks are assigned to dummy files. Inadvertently reading one of these files can take a drive off line.
One reason a device will go off line is the presence of a media error, or the presence of a situation assumed by "smartd" to be a pending data risk..... Understanding the root cause error should be done. Smartd tends to be cautious but does identify pending problems.
One puzzle can be the loss of log file data. It is sometimes possible to see events on a live system that later vanish after a reboot because buffers are live in memory but not on the disk. Sending logs to another 'log system' can be helpful and is a good idea on production systems for exactly this reason.
NiftyClusters Mitch wrote:
On Fri, Aug 1, 2008 at 1:43 PM, William L. Maltby CentOS4Bill@triad.rr.com wrote:
On Fri, 2008-08-01 at 16:13 -0400, Toby Bluhm wrote:
Mufit Eribol wrote:
<snip>
.....
..... that you would correctly try to
fsck the *device*.
First backup data... It is possible to run "fsck" with a media test flag. Bad blocks are assigned to dummy files. Inadvertently reading one of these files can take a drive off line.
One reason a device will go off line is the presence of a media error, or the presence of a situation assumed by "smartd" to be a pending data risk..... Understanding the root cause error should be done. Smartd tends to be cautious but does identify pending problems.
One puzzle can be the loss of log file data. It is sometimes possible to see events on a live system that later vanish after a reboot because buffers are live in memory but not on the disk. Sending logs to another 'log system' can be helpful and is a good idea on production systems for exactly this reason
I copied /mnt/raid/var to /new_var using tar as explained in Toby's message. Changed the link var to /new_var. After reboot, it was possible to umount /mnt/raid and fsck. All the errors were corrected. Everything works perfect now.
I appreciate all who shared his experience, knowledge and advised me on this thread.
Thank you. Mufit
Toby Bluhm wrote:
Mufit Eribol wrote:
I have a LV on RAID mounted as /mnt/raid. Then /mnt/raid/var is symlinked to /var.
I was afraid you were going to say that.
Go back to single user mode.
mkdir /new_var cd /mnt/raid/var tar cf - . | ( cd /new_var ; tar xvf - )
Make sure both dirs look the same.
Change the link to /new_var. Or remove the old link & mv /new_var /var.
reboot.
Toby, Thank you for this nice tip. It worked perfectly. The server is back in the game again.
Just for my learning experience, I would appreciate if you clarify one point though. Why are you afraid when you hear /mnt/raid/var symlinked to /var? Is something wrong with it?
Here is my fstab: /dev/md2 / ext3 defaults 1 1 <--- md2 Software RAID1 /dev/md1 /boot ext3 defaults 1 2 <--- md0 Software RAID1 /dev/md0 /tmp ext3 defaults 1 2 <--- md1 Software RAID1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=SWAP-sda3 swap swap defaults,pri=1 0 0 LABEL=SWAP-sdb3 swap swap defaults,pri=1 0 0 /dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0 <--- Hardware RAID10
Before, home and var were under /mnt/raid directory and symlinked to /home and /var. Now, both directories were copied to / (md2 software RAID1) as new_home and new_var and /home and /var symlinks are now pointing to these new directories. /mnt/raid (hardware RAID10) which is the main storage of my server is not being used at the moment.
I am planning to have 2 logical volumes (for home and var separately) instead of 1. Then, they will be mounted as separate partitions as /home and /var to /dev/raid_vg0/raid_lv0 and /dev/raid_vg0/raid_lv1, respectively. Is it a good approach? Please advise.
Thank you again. Mufit
Mufit Eribol wrote:
Toby Bluhm wrote:
Mufit Eribol wrote:
I have a LV on RAID mounted as /mnt/raid. Then /mnt/raid/var is symlinked to /var.
I was afraid you were going to say that.
Go back to single user mode.
mkdir /new_var cd /mnt/raid/var tar cf - . | ( cd /new_var ; tar xvf - )
Make sure both dirs look the same.
Change the link to /new_var. Or remove the old link & mv /new_var /var.
reboot.
Toby, Thank you for this nice tip. It worked perfectly. The server is back in the game again.
Just for my learning experience, I would appreciate if you clarify one point though. Why are you afraid when you hear /mnt/raid/var symlinked to /var?
Because it can complicate a recovery, as you just experienced.
Why did you feel a need to have /var setup as you did? Did you expect to fill it up quickly or a need for speed? You also have /tmp separate - do you expect more than usual activity there?
Perhaps a better question would be - What is the purpose of this machine? If it's a just a fileserver on a home lan, you don't *need* to make it complicated, although learning is fun :-).
Running a very active internet facing box with email, mysql, apache, etc. would probably call for a more complicated setup - which would actually make recovery & security easier/better.
Here is my fstab: /dev/md2 / ext3 defaults 1 1 <--- md2 Software RAID1 /dev/md1 /boot ext3 defaults 1 2 <--- md0 Software RAID1 /dev/md0 /tmp ext3 defaults 1 2 <--- md1 Software RAID1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=SWAP-sda3 swap swap defaults,pri=1 0 0 LABEL=SWAP-sdb3 swap swap defaults,pri=1 0 0 /dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0 <--- Hardware RAID10
Before, home and var were under /mnt/raid directory and symlinked to /home and /var. Now, both directories were copied to / (md2 software RAID1) as new_home and new_var and /home and /var symlinks are now pointing to these new directories. /mnt/raid (hardware RAID10) which is the main storage of my server is not being used at the moment.
Instead of using links, may as well just mount it where it belongs.
I am planning to have 2 logical volumes (for home and var separately) instead of 1. Then, they will be mounted as separate partitions as /home and /var to /dev/raid_vg0/raid_lv0 and /dev/raid_vg0/raid_lv1, respectively. Is it a good approach? Please advise.
I'm somewhat simple-minded - I like to keep the system that way :-). I split the partitions into 3
/ swap /home
either on a single disk or mirrored ( swap mirrored too ) - no lvm. For data storage I use lvm on raid on a separate mount point. Not saying you should do the same - it's just what I do.
Toby Bluhm wrote:
Mufit Eribol wrote:
Toby Bluhm wrote:
Mufit Eribol wrote:
I have a LV on RAID mounted as /mnt/raid. Then /mnt/raid/var is symlinked to /var.
I was afraid you were going to say that.
Go back to single user mode.
mkdir /new_var cd /mnt/raid/var tar cf - . | ( cd /new_var ; tar xvf - )
Make sure both dirs look the same.
Change the link to /new_var. Or remove the old link & mv /new_var /var.
reboot.
Toby, Thank you for this nice tip. It worked perfectly. The server is back in the game again.
Just for my learning experience, I would appreciate if you clarify one point though. Why are you afraid when you hear /mnt/raid/var symlinked to /var?
Because it can complicate a recovery, as you just experienced.
Why did you feel a need to have /var setup as you did? Did you expect to fill it up quickly or a need for speed? You also have /tmp separate
- do you expect more than usual activity there?
Perhaps a better question would be - What is the purpose of this machine? If it's a just a fileserver on a home lan, you don't *need* to make it complicated, although learning is fun :-).
Running a very active internet facing box with email, mysql, apache, etc. would probably call for a more complicated setup - which would actually make recovery & security easier/better.
This box is loaded with cyrus-imapd, postfix, amavisd, clamd, spamassassin, mysql, postgresql, apache, CRM, DMS, named, hylafax etc for a small company. I wanted to keep operating system on 2 SATA disks (RAID1), data (var and home) on a high capacity RAID10 (4 SATA disks). It works also a file server. I just wanted more capacity for home and var directories, hence they are on separate RAID controller. It is more difficult if the OS is also on RAID controller as the driver should be loaded before the OS is up and running. When I install a new kernel, I compile the raid driver easily with my setup. So, having OS on soft RAID and data files (home and var) on RAID controller seemed better idea when I setup the system.
Here is my fstab: /dev/md2 / ext3 defaults 1 1 <--- md2 Software RAID1 /dev/md1 /boot ext3 defaults 1 2 <--- md0 Software RAID1 /dev/md0 /tmp ext3 defaults 1 2 <--- md1 Software RAID1 tmpfs /dev/shm tmpfs defaults 0 0 devpts /dev/pts devpts gid=5,mode=620 0 0 sysfs /sys sysfs defaults 0 0 proc /proc proc defaults 0 0 LABEL=SWAP-sda3 swap swap defaults,pri=1 0 0 LABEL=SWAP-sdb3 swap swap defaults,pri=1 0 0 /dev/raid_vg0/raid_lv0 /mnt/raid ext3 defaults 0 0 <--- Hardware RAID10
Before, home and var were under /mnt/raid directory and symlinked to /home and /var. Now, both directories were copied to / (md2 software RAID1) as new_home and new_var and /home and /var symlinks are now pointing to these new directories. /mnt/raid (hardware RAID10) which is the main storage of my server is not being used at the moment.
Instead of using links, may as well just mount it where it belongs.
I will follow your advice. I will mount /var and /home on RAID controller separately (2 separate VGs). But, some distros, one of them is Ubuntu, wants to have /var/run and /var/lock on the same partition as /. I don't know if CentOS 5.2 has such a requirement. If it has, I will mkdir /var/run and /var/lock on the same partition as / bu umounting /var first.
I am planning to have 2 logical volumes (for home and var separately) instead of 1. Then, they will be mounted as separate partitions as /home and /var to /dev/raid_vg0/raid_lv0 and /dev/raid_vg0/raid_lv1, respectively. Is it a good approach? Please advise.
I'm somewhat simple-minded - I like to keep the system that way :-). I split the partitions into 3
/ swap /home
either on a single disk or mirrored ( swap mirrored too ) - no lvm. For data storage I use lvm on raid on a separate mount point. Not saying you should do the same - it's just what I do.
Yes, it is simple. Perhaps I am victim of the articles advocating more partitions on the internet :-)
Thank you. Mufit