Hi all,
Is anyone using oprofile?
I'm getting segfaults from opreport at the moment, and I'm not sure if it is opreport, or just me.
In case it is something just plain daft I am doing, here is how it goes:
opcontrol --reset opcontrol --setup --no-vmlinux opcontrol --start
... now I run my program, /tmp/myprog ...
opcontrol --dump opcontrol --shutdown
then I run, opreport -l /tmp/myprog and get: warning: [vdso] (tgid:4780 range:0x8d7000-0x8d8000) could not be found. warning: [vdso] (tgid:4784 range:0x860000-0x861000) could not be found. CPU: Core Solo / Duo, speed 1067 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 Segmentation fault
The problem seems to come when I ask for symbols (-l). I can't seem to get much information out of other running programs either. I'm running the opcontrol commands as root, and I've tried opreport as root or myself.
I'm sure I have done this successfully in the past (but this is the first time I have tried it in a while). I have what should be a fully patched CentOS 5.5 here. I see this: https://bugzilla.redhat.com/show_bug.cgi?id=529028 , but it looks to be too old to be relevant (I notice I have the updated binutils that the final link in the bugreport points to). I've tried this on two different machines, but they are similar configurations I guess in that they are both 32bit intel machines.
Hywel.
On Thursday 12 August 2010, Hywel Richards wrote:
Hi all,
Is anyone using oprofile?
I'm getting segfaults from opreport at the moment, and I'm not sure if it is opreport, or just me.
I've tried the steps you outline below and it works for me (updated C5.5 as of 10m ago). My only guess is that your binary is b0rked. What happens if you do "opreport -l" instead of "opreport -l /tmp/myprog"? Was myprog compiled with "-g"?
And just to be sure, could you provide "uname -a" and "rpm -q oprofile".
/Peter
In case it is something just plain daft I am doing, here is how it goes:
opcontrol --reset opcontrol --setup --no-vmlinux opcontrol --start
... now I run my program, /tmp/myprog ...
opcontrol --dump opcontrol --shutdown
then I run, opreport -l /tmp/myprog and get: warning: [vdso] (tgid:4780 range:0x8d7000-0x8d8000) could not be found. warning: [vdso] (tgid:4784 range:0x860000-0x861000) could not be found. CPU: Core Solo / Duo, speed 1067 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 Segmentation fault
...
Peter Kjellstrom wrote:
On Thursday 12 August 2010, Hywel Richards wrote:
Hi all,
Is anyone using oprofile?
I'm getting segfaults from opreport at the moment, and I'm not sure if it is opreport, or just me.
I've tried the steps you outline below and it works for me (updated C5.5 as of 10m ago). My only guess is that your binary is b0rked. What happens if you do "opreport -l" instead of "opreport -l /tmp/myprog"? Was myprog compiled with "-g"?
And just to be sure, could you provide "uname -a" and "rpm -q oprofile".
/Peter
"opreport -l" gives:
warning: /no-vmlinux could not be found. warning: [vdso] (tgid:11369 range:0xdc9000-0xdca000) could not be found. warning: [vdso] (tgid:2453 range:0x154000-0x155000) could not be found. warning: [vdso] (tgid:24792 range:0x8e6000-0x8e7000) could not be found. warning: [vdso] (tgid:24797 range:0xd4a000-0xd4b000) could not be found. warning: [vdso] (tgid:3211 range:0x660000-0x661000) could not be found. warning: [vdso] (tgid:3426 range:0xed7000-0xed8000) could not be found. warning: [vdso] (tgid:3429 range:0x798000-0x799000) could not be found. warning: [vdso] (tgid:3549 range:0x3a8000-0x3a9000) could not be found. warning: [vdso] (tgid:3551 range:0x86b000-0x86c000) could not be found. warning: [vdso] (tgid:3575 range:0x90a000-0x90b000) could not be found. warning: [vdso] (tgid:3595 range:0xd9d000-0xd9e000) could not be found. warning: [vdso] (tgid:3641 range:0xfed000-0xfee000) could not be found. warning: [vdso] (tgid:3672 range:0xd1e000-0xd1f000) could not be found. warning: [vdso] (tgid:3673 range:0x133000-0x134000) could not be found. warning: [vdso] (tgid:3684 range:0x5c2000-0x5c3000) could not be found. warning: [vdso] (tgid:3693 range:0x189000-0x18a000) could not be found. warning: [vdso] (tgid:3698 range:0x903000-0x904000) could not be found. warning: [vdso] (tgid:3702 range:0x139000-0x13a000) could not be found. warning: [vdso] (tgid:3709 range:0xeb1000-0xeb2000) could not be found. warning: [vdso] (tgid:3756 range:0xf03000-0xf04000) could not be found. warning: [vdso] (tgid:3788 range:0xb7f49000-0xb7f4a000) could not be found. warning: [vdso] (tgid:3800 range:0x7e4000-0x7e5000) could not be found. warning: [vdso] (tgid:3836 range:0x662000-0x663000) could not be found. warning: [vdso] (tgid:3899 range:0xd2e000-0xd2f000) could not be found. warning: [vdso] (tgid:3902 range:0x33c000-0x33d000) could not be found. warning: [vdso] (tgid:5178 range:0xb7f50000-0xb7f51000) could not be found. warning: [vdso] (tgid:6643 range:0xfaa000-0xfab000) could not be found. CPU: Core Solo / Duo, speed 800 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 bfd_get_section_contents:get_debug:: Bad value
myprog was not compiled with -g, but that shouldn't be a requirement, right? Anyway, I recompiled my program with -g and I still get a segfault from "opreport -l". If I skip the "-l", it at least doesn't segfault, but then not so interesting information:
opreport /tmp/myprog2 CPU: Core Solo / Duo, speed 1867 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 25950 100.000 myprog2
Here's what I get when I do it on, say, gcc (no segfault this time, but perhaps it gives some indication of what is going wrong?):
opreport /usr/bin/g++ -l warning: [vdso] (tgid:25547 range:0xc49000-0xc4a000) could not be found. warning: [vdso] (tgid:25550 range:0x749000-0x74a000) could not be found. warning: [vdso] (tgid:25565 range:0xefc000-0xefd000) could not be found. CPU: Core Solo / Duo, speed 1867 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 bfd_get_section_contents:get_debug:: Bad value
"uname -a" gives:
Linux myhost 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10 19:09:06 EDT 2010 i686 i686 i386 GNU/Linux
"rpm -q oprofile" gives:
oprofile-0.9.4-15.el5
I also did an "rpm -V oprofile" - no problems.
For the moment I'm working around this by compiling my program into a shared library, and using LD_PROFILE and sprof, which is working well.
Hywel.
On Friday 13 August 2010, Hywel Richards wrote:
Peter Kjellstrom wrote:
On Thursday 12 August 2010, Hywel Richards wrote:
Hi all,
Is anyone using oprofile?
I'm getting segfaults from opreport at the moment, and I'm not sure if it is opreport, or just me.
I've tried the steps you outline below and it works for me (updated C5.5 as of 10m ago). My only guess is that your binary is b0rked. What happens if you do "opreport -l" instead of "opreport -l /tmp/myprog"? Was myprog compiled with "-g"?
And just to be sure, could you provide "uname -a" and "rpm -q oprofile".
/Peter
"opreport -l" gives:
warning: /no-vmlinux could not be found. warning: [vdso] (tgid:11369 range:0xdc9000-0xdca000) could not be found. warning: [vdso] (tgid:2453 range:0x154000-0x155000) could not be found.
...
warning: [vdso] (tgid:6643 range:0xfaa000-0xfab000) could not be found. CPU: Core Solo / Duo, speed 800 MHz (estimated) Counted CPU_CLK_UNHALTED events (Unhalted clock cycles) with a unit mask of 0x00 (Unhalted core cycles) count 100000 bfd_get_section_contents:get_debug:: Bad value
I guess that means that there is a problem with the data collected. All the warnings about vdso ranges that can't be found is strange (I don't get that here). Are the tgids in that list special in any way?
Is this on a single machine or on several?
Do you know if this strange behaviour persists over reboots?
Anything strange in ~/.oprofile? If you don't have customizations remove it and let oprofile re-create it.
myprog was not compiled with -g, but that shouldn't be a requirement, right?
No you're right
...
"uname -a" gives:
Linux myhost 2.6.18-194.11.1.el5 #1 SMP Tue Aug 10 19:09:06 EDT 2010 i686 i686 i386 GNU/Linux
I notice that you're on 32-bit while I'm on x86_64. That may or may not be relevant. I don't have any 32-bit machines around to test on though.
"rpm -q oprofile" gives:
oprofile-0.9.4-15.el5
I also did an "rpm -V oprofile" - no problems.
No harm in being paranoid :-)
/Peter
Peter Kjellstrom wrote:
I guess that means that there is a problem with the data collected. All the warnings about vdso ranges that can't be found is strange (I don't get that here). Are the tgids in that list special in any way?
I'm not sure what would make tgids special - as I'm not exactly sure what they are in the first place (some sort of special thread id's?).
Is this on a single machine or on several?
Seems to be happening on two separate machines - both 32bit, though.
Do you know if this strange behaviour persists over reboots?
Yes, a reboot doesn't seem to make any difference.
Anything strange in ~/.oprofile? If you don't have customizations remove it and let oprofile re-create it.
I've tried that now - again, doesn't seem to make any difference.
I also uninstalled oprofile, did a "rm -r /var/lib/oprofile", and a reinstall, but I still get the same behaviour.
I notice that you're on 32-bit while I'm on x86_64. That may or may not be relevant. I don't have any 32-bit machines around to test on though.
Maybe this is the problem then.
What we need is someone else to report that oprofile works on a 32-bit machine to confirm that I'm totally jinxed :-)
Hywel.
I notice that you're on 32-bit while I'm on x86_64. That may or may not be relevant. I don't have any 32-bit machines around to test on though.
Maybe this is the problem then.
What we need is someone else to report that oprofile works on a 32-bit machine to confirm that I'm totally jinxed :-)
Check https://bugzilla.redhat.com/show_bug.cgi?id=467651
It looks like in CentOS oprofile is built against older version of binutils. When I installed the same version of binutils and oprofile from RHEL - everything started to work.
Mindaugas
Mindaugas Riauba wrote:
I notice that you're on 32-bit while I'm on x86_64. That may or may not be relevant. I don't have any 32-bit machines around to test on though.
Maybe this is the problem then.
What we need is someone else to report that oprofile works on a 32-bit machine to confirm that I'm totally jinxed :-)
Check https://bugzilla.redhat.com/show_bug.cgi?id=467651
It looks like in CentOS oprofile is built against older version of binutils. When I installed the same version of binutils and oprofile from RHEL - everything started to work.
Ha! - so it is just a problem with the way that it was compiled for CentOS, then (oprofile getting recompiled before binutils-devel somehow?).
So, I got the centos5 SRPM for oprofile, rebuilt it and installed the RPM, and hey presto - a working oprofile again (on ia32).
It looks like the current ia32 CentOS oprofile RPM in circulation, then, needs replacing with a recompiled one.
Hywel.
On Monday 16 August 2010, Hywel Richards wrote:
Mindaugas Riauba wrote:
...
Check https://bugzilla.redhat.com/show_bug.cgi?id=467651
It looks like in CentOS oprofile is built against older version of binutils. When I installed the same version of binutils and oprofile from RHEL - everything started to work.
Ha! - so it is just a problem with the way that it was compiled for CentOS, then (oprofile getting recompiled before binutils-devel somehow?).
So, I got the centos5 SRPM for oprofile, rebuilt it and installed the RPM, and hey presto - a working oprofile again (on ia32).
It looks like the current ia32 CentOS oprofile RPM in circulation, then, needs replacing with a recompiled one.
Hywel.
For the archives: http://bugs.centos.org/view.php?id=4482
/Peter