I got itchy fingers over the weekend and decided to fix what wasn't broken and upgraded one of the older servers from Centos 5.2 to Centos 5.3. Following the recommended process of updating glibc and such before the rest, it appeared to work perfectly and rebooted without problem.
However, MRTG 2.15.2 started complaining about unexpected values. I installed/updated both MRTG (2.16.2) and net-snmp to the latest available in hope of fixing it. Subsequently, MRTG stopped working altogether.
I've spent the whole weekend and whole Monday morning trying to fix it and thus far have only finally managed to get garbage values showing up in MRTG again as opposed to nothing. And this required learning about SNMP and adding many additional lines to the original MRTG configuration file, none of which I had to do previously.
Did anybody else have similar experiences with MRTG failing after the update and what was the simple fix? It does not make any sense that I have to jump through so much hoops to get just the default functionality back. Thus I believe there must be one small thing I'm overlooking.
Thanks for any advice.
In article 667c2e1e0907122340vc63cf71t506fd0f1f8832f4d@mail.gmail.com, Noob Centos Admin centos.admin@gmail.com wrote:
I got itchy fingers over the weekend and decided to fix what wasn't broken and upgraded one of the older servers from Centos 5.2 to Centos 5.3. Following the recommended process of updating glibc and such before the rest, it appeared to work perfectly and rebooted without problem.
However, MRTG 2.15.2 started complaining about unexpected values. I installed/updated both MRTG (2.16.2) and net-snmp to the latest available in hope of fixing it. Subsequently, MRTG stopped working altogether.
I've spent the whole weekend and whole Monday morning trying to fix it and thus far have only finally managed to get garbage values showing up in MRTG again as opposed to nothing. And this required learning about SNMP and adding many additional lines to the original MRTG configuration file, none of which I had to do previously.
Did anybody else have similar experiences with MRTG failing after the update and what was the simple fix? It does not make any sense that I have to jump through so much hoops to get just the default functionality back. Thus I believe there must be one small thing I'm overlooking.
Perhaps the OIDs changed for the interfaces you are monitoring.
Have you tried re-running cfgmaker to regenerate mrtg.cfg? It should pick up the correct OIDs again.
Cheers Tony
Hi,
Perhaps the OIDs changed for the interfaces you are monitoring.
Have you tried re-running cfgmaker to regenerate mrtg.cfg? It should pick up the correct OIDs again.
Yes I did, however the default MRTG configuration appears to contain almost nothing. Consulting with others. it seems to be the norm, MRTG should pick up the standard OIDs for the basics, i.e. load and network traffic if nothing's specified.
Currently, I had to manually insert target lines after figuring out the OIDs in order to get garbage data into the log files. Garbage data because while the debug log shows some numbers corresponding to output from top, MRTG is producing graphs that bear no resemblance to it.
Reproducing the entire default MRTG configuration would therefore pretty much require a very long config file, as well as coming up with formulas to twist the data into something that would produce sensible graphs... which obviously don't seem like the right way to do it.
Noob Centos Admin wrote:
Hi,
Perhaps the OIDs changed for the interfaces you are monitoring.
Have you tried re-running cfgmaker to regenerate mrtg.cfg? It should pick up the correct OIDs again.
Yes I did, however the default MRTG configuration appears to contain almost nothing. Consulting with others. it seems to be the norm, MRTG should pick up the standard OIDs for the basics, i.e. load and network traffic if nothing's specified.
Currently, I had to manually insert target lines after figuring out the OIDs in order to get garbage data into the log files. Garbage data because while the debug log shows some numbers corresponding to output from top, MRTG is producing graphs that bear no resemblance to it.
Reproducing the entire default MRTG configuration would therefore pretty much require a very long config file, as well as coming up with formulas to twist the data into something that would produce sensible graphs... which obviously don't seem like the right way to do it.
Did the update overwrite your snmpd.conf file? The 'view' on the default one may not permit access to the things mrtg needs to see. Try changing it to .1 to expose everything.
Hi,
Did the update overwrite your snmpd.conf file? The 'view' on the default one may not permit access to the things mrtg needs to see. Try changing it to .1 to expose everything.
It might have done so. To be honest I have no idea since I've never touched the SNMP configuration before this and simply used the default. Currently there's nothing inside the snmpd.conf except a rocommunity which is the public user.
I've added lines from an online source that claims that is the default snmpd configuration and it looks like it should be allowing view all to the public user. In any case, even prior to adding these lines, I could get the relevant values off SNMP using command line with the public community user, so I don't think I was blocking any thing in SNMP
----------------------- snmpd.conf ------------------ #existing line rocommunity public localhost
#added by me com2sec public default public group public v1 public group public v2c public group public usm public view all included .1 access public "" any noauth exact all none none ------------------------ end ----------------------------
As expected, MRTG behaviour remains unchanged. In fact, looking at the mrtg log, with the default blank mrtg.cfg it does not even appear to be trying to poll SNMP. This is because if I added the target lines myself, MRTG would at least scream at me if SNMP does not return values or cannot find the variable name.
On Tue, 2009-07-14 at 12:07 +0800, Noob Centos Admin wrote:
Hi,
Did the update overwrite your snmpd.conf file? The 'view' on the default one may not permit access to the things mrtg needs to see. Try changing it to .1 to expose everything.
It might have done so. To be honest I have no idea since I've never touched the SNMP configuration before this and simply used the default. Currently there's nothing inside the snmpd.conf except a rocommunity which is the public user.
I've added lines from an online source that claims that is the default snmpd configuration and it looks like it should be allowing view all to the public user. In any case, even prior to adding these lines, I could get the relevant values off SNMP using command line with the public community user, so I don't think I was blocking any thing iv SNMP
Just a couple of random suggestions...
One of the things I always do after patching a box is do an 'updatedb', followed by 'locate rpmsave' and 'locate rpmnew'. Then I resolve the differences.
The other suggestion comes from a recent experience I had when updating a box running cacti. Did the upgrade, then cacti broke completely. Turns out that I didn't have the default fonts that cacti expected. I ended up having to install dejavu-lgc-fonts from rpmforge to resolve it. Why do I bring this up? Because cacti depends upon rrdtool, just like mrtg...
-I
Noob Centos Admin wrote:
I got itchy fingers over the weekend and decided to fix what wasn't broken and upgraded one of the older servers from Centos 5.2 to Centos 5.3. Following the recommended process of updating glibc and such before the rest, it appeared to work perfectly and rebooted without problem. [...]
Did anybody else have similar experiences with MRTG failing after the update and what was the simple fix? It does not make any sense that I have to jump through so much hoops to get just the default functionality back. Thus I believe there must be one small thing I'm overlooking.
Check the snmpd.options file (it can be at either /etc/snmpd/snmpd.options or /etc/sysconfig/snmpd.options depending on your system history). When I upgraded to 5.3 I found that it broke the options I was using to suppress logging of the SNMP polling. A set of options that work are
OPTIONS="-Ln -Lf /dev/null -p /var/run/snmpd.pid"
Thanks guys for all the suggestions. None of it changed the situation but I'm beginning to think that it might have to do with SNMP not accepting word names in MRTG, or more specifically some kind of language encoding issue.
This is because of the following reasons
1. It's been pointed that out that MRTG need to be started with the options env LANG=C because it won't work properly if LANG is UTF8
2. On some options I try in MRTG, the log shows some error about Wide characters returned from SNMP, and I see a chinese character, which obviously shouldn't be a return value.
3. Addressing SNMP variables by name does not work in MRTG, but works from command line. e.g. something like ssRawCpuLoad is fine in command line, but does not work in MRTG config file, only the dot-numeric equivalent would return some kind of data in MRTG.
4. The problem started AFTER I rebooted the system after the update, so the reboot might have possibly allowed some settings to take effect with regards to the server's encoding. Maybe Centos 5.3 went from an EN_US language default to UTF8 default?
If this is indeed the case, how would I possible change the interface/shell language settings back to the English one, since I don't typically need to input non-English characters nor view them in shell?
I've added a LANG='en_US' and export LANG line in /etc/profile but it doesn't seem to be doing anything. Do I need a reboot for it to work like I am guessing based on #4 above?
Thanks!
Noob Centos Admin wrote:
Thanks guys for all the suggestions. None of it changed the situation but I'm beginning to think that it might have to do with SNMP not accepting word names in MRTG, or more specifically some kind of language encoding issue.
This is because of the following reasons
- It's been pointed that out that MRTG need to be started with the
options env LANG=C because it won't work properly if LANG is UTF8
- On some options I try in MRTG, the log shows some error about Wide
characters returned from SNMP, and I see a chinese character, which obviously shouldn't be a return value.
- Addressing SNMP variables by name does not work in MRTG, but works
from command line. e.g. something like ssRawCpuLoad is fine in command line, but does not work in MRTG config file, only the dot-numeric equivalent would return some kind of data in MRTG.
- The problem started AFTER I rebooted the system after the update,
so the reboot might have possibly allowed some settings to take effect with regards to the server's encoding. Maybe Centos 5.3 went from an EN_US language default to UTF8 default?
If this is indeed the case, how would I possible change the interface/shell language settings back to the English one, since I don't typically need to input non-English characters nor view them in shell?
I've added a LANG='en_US' and export LANG line in /etc/profile but it doesn't seem to be doing anything. Do I need a reboot for it to work like I am guessing based on #4 above?
I don't see any similar problem on machines upgraded to Centos5.3 that are monitored with (and running) OpenNMS, so I'd guess that since you didn't change your snmpd.conf settings it is MRTG-specific.
And btw: OpenNMS might be overkill for your purpose, but you might want to take a look: http://www.opennms.org.
Hi,
I don't see any similar problem on machines upgraded to Centos5.3 that are monitored with (and running) OpenNMS, so I'd guess that since you didn't change your snmpd.conf settings it is MRTG-specific.
I think it's my server, quite possibly I screwed up something during the initial setup two years ago or along the way updating it from 5.0 and so forth until it's not behaving in any recognizable manner anymore.
And btw: OpenNMS might be overkill for your purpose, but you might want to take a look: http://www.opennms.org.
It looks good and I decided to give it a try in hope that maybe it can be up and running faster than I can get MRTG to work again. Unfortunately, as above mentioned, my server does not behave like a CentOS server anymore. Following the steps at OpenNMS, I get to the install -dis stage where it promptly dies because it cannot find jrrd.
downloaded jrrd but it refuses to ./configure because it cannot find rrd_create
yum install rrdtool but there was no rrd_create
searched online and the only result that was similar... was somebody having the same problem on a Solaris server <-- hence making me wonder if I was logging into the wrong server. Using the instructions there however, I at least learnt how to tell configure where rrdtool was... but it still cannot find rrd_create for the ./configure process
Having spent almost 5 days on this, I'm officially giving up on monitoring the server with these tools. Writing a PHP script seems a lot faster, I've already gotten a basic script running to pull load figures from exec'ing uptime and emailing warnings if the load figures stay above a certain level.
Now I just have to expand the script to exec snmpget for the other metrices I need to keep track of. It's really frustrating that I have to resort to writing my own code when these things worked fine for other people.
Noob Centos Admin wrote:
downloaded jrrd but it refuses to ./configure because it cannot find rrd_create
yum install rrdtool but there was no rrd_create
well, i note there's a few versions of rrdtool in the various repositories. the stock CentOS 5 version 9from upstream) is 1.2.30, while rpmforge has 1.3.7, also a seperate rrdutils package (I have no idea whats in it)
# yum list rrd* ... Installed Packages rrdtool.i386 1.2.30-1.el5.rf installed Available Packages rrdtool.i386 1.3.7-1.el5.rf rpmforge rrdtool-devel.i386 1.3.7-1.el5.rf rpmforge rrdtool-doc.i386 1.2.27-3.el5 epel rrdtool-ruby.i386 1.2.27-3.el5 epel rrdutils.noarch 5.2.1-1.el5.rf rpmforge
and, yes, the 1.2.30 version does not seem to have rrd_create...
# rpm -ql rrdtool /usr/bin/rrdcgi /usr/bin/rrdtool /usr/bin/rrdupdate /usr/lib/librrd.so.2 /usr/lib/librrd.so.2.0.15 /usr/lib/librrd_th.so.2 /usr/lib/librrd_th.so.2.0.13 /usr/share/doc/rrdtool-1.2.30 ...(bunch of share/doc files deleted)>... /usr/share/man/man1/bin_dec_hex.1.gz /usr/share/man/man1/cdeftutorial.1.gz ....(bunch of share/man pages deletged too0 /usr/share/rrdtool /usr/share/rrdtool/examples ...some examples deleted...
HOWEVER< I note that the rrdtool command has...
# rrdtool --help RRDtool 1.2.30 Copyright 1997-2008 by Tobias Oetiker tobi@oetiker.ch Compiled Feb 20 2009 18:18:07
Usage: rrdtool [options] command command_options
Valid commands: create, update, updatev, graph, dump, restore, last, lastupdate, first, info, fetch, tune, resize, xport ...
a CREATE subcommand!
John R Pierce wrote:
Noob Centos Admin wrote:
downloaded jrrd but it refuses to ./configure because it cannot find rrd_create
yum install rrdtool but there was no rrd_create
well, i note there's a few versions of rrdtool in the various repositories. the stock CentOS 5 version 9from upstream) is 1.2.30, while rpmforge has 1.3.7, also a seperate rrdutils package (I have no idea whats in it)
<snip>
Installed Packages rrdtool.i386 1.2.30-1.el5.rf installed
doesn't change much, but that 1.2.30 version is from rpmforge as well (thanks repotags!), and there's no rrdtool in stock C5
Hi,
well, i note there's a few versions of rrdtool in the various repositories. the stock CentOS 5 version 9from upstream) is 1.2.30, while rpmforge has 1.3.7, also a seperate rrdutils package (I have no idea whats in it)
*sigh* The stuff of nightmares, I did have 1.3.7 installed after checking. But searching on this direction finally yielded an important piece of information. Somebody posted back in 2008 on a site to IGNORE the jrrd problem because OpenNMS supposedly comes with some kind of java rrd already installed (which begs the question of why then is the jrrd step mentioned in the install guide).
So I went ahead with the install process which then complained that my postgresql was the wrong version, i.e. 8.4 instead of max of 8.3, but at least this time it kindly offered a -Q option to ignore the version restrictions at my own risk.
I did. Then it was on to another problem, with OpenNMS dying on startup due to port clash with DHCP. Fortunately again, this was noted as something that happens quite often on Linux systems and a quick fix was to simply comment out the dhcp configuration.
After that, it was just the usual matter of opening a port in iptables for the opennms/tomcat and FINALLY something was working.
I'm crossing my fingers that ignoring the jrrd, ignoring the versions and ignoring the dhcp monitor isn't going to bite me one of these days. For now, "ignore"nce is bliss :D
Noob Centos Admin wrote:
well, i note there's a few versions of rrdtool in the various repositories. the stock CentOS 5 version 9from upstream) is 1.2.30, while rpmforge has 1.3.7, also a seperate rrdutils package (I have no idea whats in it)
*sigh* The stuff of nightmares, I did have 1.3.7 installed after checking. But searching on this direction finally yielded an important piece of information. Somebody posted back in 2008 on a site to IGNORE the jrrd problem because OpenNMS supposedly comes with some kind of java rrd already installed (which begs the question of why then is the jrrd step mentioned in the install guide).
Initially OpenNMS used rrd to keep its data history and it still can as an option, but the default now is a built in pure-java version. The data file format is not compatible, though. Rrdtool uses a binary format that varies by processor type where the java jrobin version is portable to anything running java. I don't remember seeing this problem when installing from the opennms yum repository, though.
So I went ahead with the install process which then complained that my postgresql was the wrong version, i.e. 8.4 instead of max of 8.3, but at least this time it kindly offered a -Q option to ignore the version restrictions at my own risk.
Are you getting any benefit from mixing all of these non-stock versions on your system? How many different repositories that contain conflicting versions of packages do you use? Normally epel doesn't overwrite stock packages and opennms doesn't - the rest are somewhat dangerous. I normally leave other 3rd party repos disabled and use 'yum --enablerepo=reponame install somepackage' when I specifically want something from them (repeat with update periodically) and review the list of packages before continuing.
I did. Then it was on to another problem, with OpenNMS dying on startup due to port clash with DHCP. Fortunately again, this was noted as something that happens quite often on Linux systems and a quick fix was to simply comment out the dhcp configuration.
That is normal - typically you'd run opennms on a machine dedicated to monitoring, with perhaps thousands of targets so it wouldn't be running a lot of other services.
After that, it was just the usual matter of opening a port in iptables for the opennms/tomcat and FINALLY something was working.
I'm crossing my fingers that ignoring the jrrd, ignoring the versions and ignoring the dhcp monitor isn't going to bite me one of these days. For now, "ignore"nce is bliss :D
Removing it won't bother opennms. It has an assortment of application probes that it uses in addition to snmp and is intended to work automatically with large numbers of targets - when it discovers a node (or you add it), it probes the application ports to see what is running, then periodically tests again and notifies you when something that was previously running stops working. However, it is very configurable and you can add/remove whatever you want.
Hi,
java. I don't remember seeing this problem when installing from the opennms yum repository, though.
I didn't expect it either, honestly. In most cases, updates/installs does go relatively painlessly if I don't mess up following instructions/guides. In this case, I guess I just tripped up over the unessential jrrd.
Are you getting any benefit from mixing all of these non-stock versions on your system? How many different repositories that contain conflicting versions of packages do you use? Normally epel doesn't overwrite stock packages and opennms
I've no idea honestly, my primary role isn't server admin and I'm just winging it as I go along to support what I'm supposed to be doing with the server.
The PG 8.4 was because we're developing something for our client who's on that server, so I'm standardizing on 8.4 and likely will stick with it for quite a while, rather than going with the 8.3 since there appears to be quite a few changes in 8.4, especially on warm standby features.
Apart from what's needed, I usually try to avoid installing things on the public web servers we have.
That is normal - typically you'd run opennms on a machine dedicated to monitoring, with perhaps thousands of targets so it wouldn't be running a lot of other services.
Well, unfortunately, there's only that pair of machine in that particular location. I really needed the monitoring tool up on it because I've been noticing a higher than normal load since the weekend. My quick hack of a PHP/cat /proc/loadavg script was also alerting me consistently. After a couple of hours on opennms, it became obvious that something was hitting the server. Turns out that the client did not set the appropriate measures on their forum software and bots were having a field day hitting it to break the image recognition and finally got through to spamming.
Removing it won't bother opennms. It has an assortment of application probes that it uses in addition to snmp and is intended to work automatically with large numbers of targets - when it discovers a node (or you add it), it probes the application ports to see what is running, then periodically tests again and notifies you when something that was previously running stops working. However, it is very configurable and you can add/remove whatever you want.
Yup, it's pretty cool and that web interface really helps. While I am perfectly at home using a text editor, I really don't want to have to wade through and edit tons of text just to do something a few clicks should handle.
Thanks again for pointing me to opennms :)
Noob Centos Admin wrote:
That is normal - typically you'd run opennms on a machine dedicated to monitoring, with perhaps thousands of targets so it wouldn't be running a lot of other services.
Well, unfortunately, there's only that pair of machine in that particular location. I really needed the monitoring tool up on it because I've been noticing a higher than normal load since the weekend.
A possible work-around is to use a VPN like openvpn to give you what look like normal routes to remote locations even with private addressing.
Hi,
A possible work-around is to use a VPN like openvpn to give you what look like normal routes to remote locations even with private addressing.
Given the amount of trouble I've had just getting monitoring to work, I don't think I'm even going to try fiddling with openVPN.
Besides which, after I went to sleep happily last night, I woke up this morning to find openNMS has decided to mysteriously stop working just like MRTG previously. The service is running, opennms -v status indicates every is a-OK, but the web interface is just not responding. No log entries, not a single clue. Nothing changed, except my mood or maybe the datacenter decided port 8980 is an hacking attempt and decided to close it off. :(
I'm so tired of this whole monitoring crap that I'm not even going to bother to fix it. My crude load warning script still runs fine. So until it starts complaining consistently about the load, I think I'm just going to be an irresponsible admin on top of being a noob one and just do work that I'm getting paid for. *sigh*