-----Original Message----- From: Les Mikesell [mailto:lesmikesell@gmail.com] Sent: Tuesday, October 29, 2013 5:25 PM To: CentOS mailing list Subject: Re: [CentOS] How should I reinstall CentOS?
On Tue, Oct 29, 2013 at 3:47 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
<SNIP>
I'm not willing to put in another week of effort out of a probably vain hope of discovery.
You might try running 'rpm -Va' to see if there are any surprises in the list of differences between the current state and what was installed.
-- Les Mikesell lesmikesell@gmail.com
Note on the rpm -Va... I had an issue recently where a package (openoffice) would not work correctly and: yum reinstall openoffice* #did not help rpm -Va #did not find ANY issues (which surprised me when I did figure out what was wrong)
however doing a (I did a more restricted, to openoffice files, version of) find /usr/ -not -perm -o+r -exec ls -lhd {} + find /usr/ -type d -not -perm -g+x -exec ls -lhd {} + find /usr/ -type d -not -perm -o+x -exec ls -lhd {} + found files that were not even set to write for ROOT (and in general had NO permission for anyone else)! The idea is basically that almost all files in /usr/ should be READABLE by any user and almost all directories should be READABLE & EXECUTABLE by all so that they can list and read the files in them.
I don't know if the commands at http://www.cyberciti.biz/tips/reset-rhel-centos-fedora-package-file-perm ission.html would have fixed the issue, because I brute forced the perms to root writeable and then reinstalled the packages again.
BTW, I still feel a little confused on what the OP's original problem was and why they are headed in the direction of a 'reinstall the system'. Seems a bit overkill for most problems.
Even when this disclaimer is not here: I am not a contracting officer. I do not have authority to make or modify the terms of any contract.
On Wed, 30 Oct 2013, Denniston, Todd A CIV NAVSURFWARCENDIV Crane wrote:
BTW, I still feel a little confused on what the OP's original problem was and why they are headed in the direction of a 'reinstall the system'. Seems a bit overkill for most problems.
gdm hangs. All attempts at diagnosis or repair have failed. I've done a yum reinstall *
Most recently I did an explicit uninstall of gdm and its dependents. After installing them again I issued the following command: [root@localhost] hennebry# telinit 5 [root@localhost] hennebry# Calling the system activity data collector (sadc): Starting portreserve: [OK] Enabling p4-clockmod driver (passive cooling only): [OK] Starting irqbalance: [OK] Retrigger failed udev events: [OK] Enabling Bluetooth devices user had insufficient privilege
After I got back from another virtual terminal, the subsequent lines had appeared. I do not have any bluetooth devices.
I really hate having to reinstall the system. It's giving up, but I'm beaten. What is worse, I do not even know whether the reinstall will work.
On Wed, Oct 30, 2013 at 12:28 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
gdm hangs.
[...]
user had insufficient privilege
That likely means that the pid file for the process you are about to start exists in /var/run/ but it is unreadable. You should be running as root at that point, so that's odd, but maybe you have file system corruption or some other cruft there. I don't think should cause a hang, though. If you switch to a virtual console can you tell what process is hung and see what strace says it is waiting for?
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 12:28 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
gdm hangs.
[...]
user had insufficient privilege
That likely means that the pid file for the process you are about to start exists in /var/run/ but it is unreadable. You should be running as root at that point, so that's odd, but maybe you have file system corruption or some other cruft there. I don't think should cause a hang, though. If you switch to a virtual console can you tell what process is hung and see what strace says it is waiting for?
I know what strace does, but where should I use it?
Michael Hennebry wrote:
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 12:28 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
gdm hangs.
[...]
user had insufficient privilege
That likely means that the pid file for the process you are about to start exists in /var/run/ but it is unreadable. You should be running as root at that point, so that's odd, but maybe you have file system corruption or some other cruft there. I don't think should cause a hang, though. If you switch to a virtual console can you tell what process is hung and see what strace says it is waiting for?
I know what strace does, but where should I use it?
strace -p <PID of gdm>
mark
On Wed, 30 Oct 2013, m.roth@5-cent.us wrote:
Michael Hennebry wrote:
I know what strace does, but where should I use it?
strace -p <PID of gdm>
I've made three posts since then. Two of them mentioned using strace on gdm. Are you not getting my posts?
On Wed, Oct 30, 2013 at 2:40 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 12:28 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
gdm hangs.
[...]
user had insufficient privilege
That likely means that the pid file for the process you are about to start exists in /var/run/ but it is unreadable. You should be running as root at that point, so that's odd, but maybe you have file system corruption or some other cruft there. I don't think should cause a hang, though. If you switch to a virtual console can you tell what process is hung and see what strace says it is waiting for?
I know what strace does, but where should I use it?
Either ssh in from somewhere else or log in on a virtual terminal (e.g. alt+F2) so you still have access if the main console hangs when you 'telinit 5'. Use ps in the other session to see if you can find the hung process and then 'strace -p pid' will show if it is waiting for some system call to complete.
Does 'startx' work at the console from runlevel 3?
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 2:40 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 12:28 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
gdm hangs.
[...]
user had insufficient privilege
That likely means that the pid file for the process you are about to start exists in /var/run/ but it is unreadable. You should be running as root at that point, so that's odd, but maybe you have file system corruption or some other cruft there. I don't think should cause a hang, though. If you switch to a virtual console can you tell what process is hung and see what strace says it is waiting for?
I know what strace does, but where should I use it?
Either ssh in from somewhere else or log in on a virtual terminal (e.g. alt+F2) so you still have access if the main console hangs when you 'telinit 5'. Use ps in the other session to see if you can find the hung process and then 'strace -p pid' will show if it is waiting for some system call to complete.
root 2616 1 0 14:46 ? 00:00:00 /usr/sbin/gdm-binary -nodaemon root 2636 2616 0 14:46 ? 00:00:00 /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1 root 2638 2636 0 14:46 tty7 00:00:00 /usr/bin/Xorg :0 -br -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-q5Pjv4/database -nolisten tcp gdm 2654 2636 0 14:46 ? 00:00:00 [dbus-launch] <defunct> gdm 2657 1 0 14:46 ? 00:00:00 /usr/bin/dbus-launch --exit-with-session gdm 2658 1 0 14:46 ? 00:00:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session root 2728 2666 0 14:52 tty2 00:00:00 strace -o /tmp/gdm.strace -p2616 root 2999 2736 0 15:05 tty3 00:00:00 grep -e org -e gdm
2616 was in gdm.pid . --nodaemon? Here is the result of strace on it: restart_syscall(<... resuming interrupted call ...>) = 1 read(3, "l\4\1\1\36\0\0\0\17\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 2048) = 380 read(3, 0x87d3eb8, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "l\4\1\1\36\0\0\0\21\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 2048) = 190 read(3, 0x87d3eb8, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 0) = 0 (Timeout) ... poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "l\4\1\1\36\0\0\0\31\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 2048) = 190 read(3, 0x87d3eb8, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "l\4\1\1\35\0\0\0\32\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 2048) = 189 read(3, 0x87d3eb8, 2048) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, 0) = 0 (Timeout) poll([{fd=4, events=POLLIN}, {fd=3, events=POLLIN}], 2, -1
BTW after issuing messages, telinit lets me use the console.
Does 'startx' work at the console from runlevel 3?
I'll try it.
On Wed, Oct 30, 2013 at 3:22 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
2616 was in gdm.pid . --nodaemon? Here is the result of strace on it: restart_syscall(<... resuming interrupted call ...>) = 1 read(3, "l\4\1\1\36\0\0\0\17\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 2048) = 380 read(3, 0x87d3eb8, 2048) = -1 EAGAIN (Resource temporarily unavailable)
That seems odd. If you do : ls -l /proc/2616/fd/3 you should see the file it is trying to read.(maybe loading a shared library, but the read should not be short like that.
Does 'startx' work at the console from runlevel 3?
I'll try it.
It might make the machine usable to do that instead of a gdm login.
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 3:22 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
2616 was in gdm.pid . --nodaemon? Here is the result of strace on it: restart_syscall(<... resuming interrupted call ...>) = 1 read(3, "l\4\1\1\36\0\0\0\17\0\0\0\211\0\0\0\1\1o\0\25\0\0\0/org/fre"..., 2048) = 380 read(3, 0x87d3eb8, 2048) = -1 EAGAIN (Resource temporarily unavailable)
That seems odd. If you do : ls -l /proc/2616/fd/3 you should see the file it is trying to read.(maybe loading a shared library, but the read should not be short like that.
This time it was pid 2859 and file descriptor 4, which pointed to pipe[20775] .
From lsof:
gdm-binar 2859 root 4r FIFO 0,8 0t0 20775 pipe gdm-binar 2859 root 5w FIFO 0,8 0t0 20775 pipe Both ends of the pipe appear to be in the same process.
While gdm was hanging, I did a startx -- 4 from virtual terminal 4. It seemed to work, but crapped out while I was composing an e-mail.
From Xorg.4.log:
(WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 1317.695] Initializing built-in extension MIT-SCREEN-SAVER [ 1318.593] (EE) PreInit returned 8 for "HDA ATI HDMI HDMI/DP,pcm=3" [ 1318.593] (EE) config/hal: NewInputDeviceRequest failed (8) Don't know what that means. 'Tis something I've looked for before without learning anything.
I reinstalled CentOS. Just used the default repositories. It just dies. The screen suddenly goes dark and I cannot do anything. Other virtual teminals are unreachable. The green light on my monitor turns yellow, indicating not signal. The only evidence of life is that I can usually reset it with the reset button. The power button is usually unnecessary.
I cannot try to remote in from another computer because I do not have another computer.
After a couple of the crashes, I've seen pages of orphan nodes in the / partition. Eventually I did yet another install using another / partition. Fewer orphan nodes.
I've looked in /var/{dmesg,Xorg.0.log,messages}, but did see anything that looked like a killer. That might be because I do not know what I am looking for.
Any suggestions on where else I should look or what I should be looking for?
On Mon, Nov 4, 2013 at 9:55 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
I reinstalled CentOS. Just used the default repositories. It just dies. The screen suddenly goes dark and I cannot do anything. Other virtual teminals are unreachable. The green light on my monitor turns yellow, indicating not signal. The only evidence of life is that I can usually reset it with the reset button. The power button is usually unnecessary.
I cannot try to remote in from another computer because I do not have another computer.
After a couple of the crashes, I've seen pages of orphan nodes in the / partition. Eventually I did yet another install using another / partition. Fewer orphan nodes.
I've looked in /var/{dmesg,Xorg.0.log,messages}, but did see anything that looked like a killer. That might be because I do not know what I am looking for.
Any suggestions on where else I should look or what I should be looking for?
I take you eliminated hardware issues already, right? The orphan node thing kinda scares me. Second, how about if you configure grub not to boot in graphical mode (http://www.linuxforums.org/forum/red-hat-fedora-linux/35143-boot-fedora-grub... You know, no X or anything like that; you can always start it manually later. In other words, let's get machine running first.
-- Michael hennebry@web.cs.ndsu.NoDak.edu "On Monday, I'm gonna have to tell my kindergarten class, whom I teach not to run with scissors, that my fiance ran me through with a broadsword." -- Lily _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Wed, 30 Oct 2013, Les Mikesell wrote:
I know what strace does, but where should I use it?
Either ssh in from somewhere else or log in on a virtual terminal (e.g. alt+F2) so you still have access if the main console hangs when you 'telinit 5'. Use ps in the other session to see if you can find the hung process and then 'strace -p pid' will show if it is waiting for some system call to complete.
Does 'startx' work at the console from runlevel 3?
Yes. As root it complains, but does it. If I su to myself, 'twon't run. If I login as myself, it seems to run correctly.
Got to go right now.
Les Mikesell wrote:
On Wed, Oct 30, 2013 at 2:40 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
On Wed, 30 Oct 2013, Les Mikesell wrote:
On Wed, Oct 30, 2013 at 12:28 PM, Michael Hennebry hennebry@web.cs.ndsu.nodak.edu wrote:
gdm hangs.
[...]
user had insufficient privilege
That likely means that the pid file for the process you are about to start exists in /var/run/ but it is unreadable. You should be running as root at that point, so that's odd, but maybe you have file system corruption or some other cruft there. I don't think should cause a hang, though. If you switch to a virtual console can you tell what process is hung and see what strace says it is waiting for?
I know what strace does, but where should I use it?
Either ssh in from somewhere else or log in on a virtual terminal (e.g. alt+F2) so you still have access if the main console hangs when you 'telinit 5'. Use ps in the other session to see if you can find the hung process and then 'strace -p pid' will show if it is waiting for some system call to complete.
Does 'startx' work at the console from runlevel 3?
Interesting question: can you run xinit with twm? How about kde?
mark