As we have every day, we had a power blip overnight. one, at least, of the servers connected to a SmartUPS via cable, announced that "power exhausted, initiating shutdown" (which I've disabled).
The thing is, I know the servers on that UPS draw a ridiculous amount of power, but I don't see that on the others... and this was three seconds, not minutes, after it announced there was a power outage.
Has anyone else seen this behavior?
mark
On Fri, Feb 8, 2013 at 8:23 AM, m.roth@5-cent.us wrote:
As we have every day, we had a power blip overnight. one, at least, of the servers connected to a SmartUPS via cable, announced that "power exhausted, initiating shutdown" (which I've disabled).
The thing is, I know the servers on that UPS draw a ridiculous amount of power, but I don't see that on the others... and this was three seconds, not minutes, after it announced there was a power outage.
Has anyone else seen this behavior?
You mean UPS's behaving badly? Yes, they break like everything else, especially the batteries.
Les Mikesell wrote:
On Fri, Feb 8, 2013 at 8:23 AM, m.roth@5-cent.us wrote:
As we have every day, we had a power blip overnight. one, at least, of the servers connected to a SmartUPS via cable, announced that "power exhausted, initiating shutdown" (which I've disabled).
The thing is, I know the servers on that UPS draw a ridiculous amount of power, but I don't see that on the others... and this was three seconds, not minutes, after it announced there was a power outage.
Has anyone else seen this behavior?
You mean UPS's behaving badly? Yes, they break like everything else, especially the batteries.
Nah, I think it's something with apcupsd.
mark
On Fri, Feb 8, 2013 at 9:54 AM, m.roth@5-cent.us wrote:
As we have every day, we had a power blip overnight. one, at least, of the servers connected to a SmartUPS via cable, announced that "power exhausted, initiating shutdown" (which I've disabled).
The thing is, I know the servers on that UPS draw a ridiculous amount of power, but I don't see that on the others... and this was three seconds, not minutes, after it announced there was a power outage.
Has anyone else seen this behavior?
You mean UPS's behaving badly? Yes, they break like everything else, especially the batteries.
Nah, I think it's something with apcupsd.
Doesn't it log the message as received from the UPS? It if has a network interface you should be get messages via syslog, email, snmp, etc., and there is probably a web interface with status showing expected battery capacity at the current load.
Les Mikesell wrote:
On Fri, Feb 8, 2013 at 9:54 AM, m.roth@5-cent.us wrote:
As we have every day, we had a power blip overnight. one, at least, of the servers connected to a SmartUPS via cable, announced that "power exhausted, initiating shutdown" (which I've disabled).
The thing is, I know the servers on that UPS draw a ridiculous amount of power, but I don't see that on the others... and this was three seconds, not minutes, after it announced there was a power outage.
Has anyone else seen this behavior?
You mean UPS's behaving badly? Yes, they break like everything else, especially the batteries.
Nah, I think it's something with apcupsd.
Doesn't it log the message as received from the UPS? It if has a network interface you should be get messages via syslog, email, snmp, etc., and there is probably a web interface with status showing expected battery capacity at the current load.
The entire contents of that incident. I see nothing in messages.
2013-02-07 17:38:19 -0500 Power failure. 2013-02-07 17:38:21 -0500 Battery power exhausted. 2013-02-07 17:38:21 -0500 Initiating system shutdown! 2013-02-07 17:38:21 -0500 User logins prohibited 2013-02-07 17:38:23 -0500 Power is back. UPS running on mains. 2013-02-07 17:38:23 -0500 Allowing logins
Two seconds?
mark
service apcupsd status
(or cat /var/log/apcupsd.events)
Craig
On Feb 8, 2013, at 9:54 AM, m.roth@5-cent.us wrote:
Les Mikesell wrote:
On Fri, Feb 8, 2013 at 9:54 AM, m.roth@5-cent.us wrote:
As we have every day, we had a power blip overnight. one, at least, of the servers connected to a SmartUPS via cable, announced that "power exhausted, initiating shutdown" (which I've disabled).
The thing is, I know the servers on that UPS draw a ridiculous amount of power, but I don't see that on the others... and this was three seconds, not minutes, after it announced there was a power outage.
Has anyone else seen this behavior?
You mean UPS's behaving badly? Yes, they break like everything else, especially the batteries.
Nah, I think it's something with apcupsd.
Doesn't it log the message as received from the UPS? It if has a network interface you should be get messages via syslog, email, snmp, etc., and there is probably a web interface with status showing expected battery capacity at the current load.
The entire contents of that incident. I see nothing in messages.
2013-02-07 17:38:19 -0500 Power failure. 2013-02-07 17:38:21 -0500 Battery power exhausted. 2013-02-07 17:38:21 -0500 Initiating system shutdown! 2013-02-07 17:38:21 -0500 User logins prohibited 2013-02-07 17:38:23 -0500 Power is back. UPS running on mains. 2013-02-07 17:38:23 -0500 Allowing logins
Two seconds?
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
Now, the "shutting down" annoys me, since it *appears* that by creating the script that it calls that returns, per the documentation, a -99, it sets a flag *somewhere*, that's never, ever changed. Just for grins, I restarted apcupsd, it's all fine, online, and the battery light is not telling me it needs to be changed, but it still reads "SHUTTING DOWN".
mark
On 2/8/2013 1:23 PM, m.roth@5-cent.us wrote:
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
You can specify in the conf file that shutdown occurs when it hits X minutes of runtime left - "MINUTES" should be the parameter. You're at 10.0 minutes left & if you have it set to something 10.0 or greater, it's probably gonna want to shutdown immediately at any AC power loss.
Toby Bluhm wrote:
On 2/8/2013 1:23 PM, m.roth@5-cent.us wrote:
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
You can specify in the conf file that shutdown occurs when it hits X minutes of runtime left - "MINUTES" should be the parameter. You're at 10.0 minutes left & if you have it set to something 10.0 or greater, it's probably gonna want to shutdown immediately at any AC power loss.
I'm afraid you've missed the whole beginning of this thread - I suggest you read it. I know what you were saying; it's the response of apcupsd to a power blip this morning that's the issue: as much as these servers draw, there's no way that the UPS is out of power in 3 seconds.
mark
On 2/8/2013 2:26 PM, m.roth@5-cent.us wrote:
Toby Bluhm wrote:
On 2/8/2013 1:23 PM, m.roth@5-cent.us wrote:
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
You can specify in the conf file that shutdown occurs when it hits X minutes of runtime left - "MINUTES" should be the parameter. You're at 10.0 minutes left & if you have it set to something 10.0 or greater, it's probably gonna want to shutdown immediately at any AC power loss.
I'm afraid you've missed the whole beginning of this thread - I suggest you read it. I know what you were saying; it's the response of apcupsd to a power blip this morning that's the issue: as much as these servers draw, there's no way that the UPS is out of power in 3 seconds.
I'm not saying it's out of battery power. I'm saying you may be telling it to shutdown when it has, by it's own calculations, 10 minutes of battery run time left. I believe the default is 3 or 5 in apcupsd.conf.
Toby Bluhm wrote:
On 2/8/2013 2:26 PM, m.roth@5-cent.us wrote:
Toby Bluhm wrote:
On 2/8/2013 1:23 PM, m.roth@5-cent.us wrote:
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
You can specify in the conf file that shutdown occurs when it hits X minutes of runtime left - "MINUTES" should be the parameter. You're at 10.0 minutes left & if you have it set to something 10.0 or greater, it's probably gonna want to shutdown immediately at any AC power loss.
I'm afraid you've missed the whole beginning of this thread - I suggest you read it. I know what you were saying; it's the response of apcupsd to a power blip this morning that's the issue: as much as these servers draw,there's no way that the UPS is out of power in 3 seconds.
I'm not saying it's out of battery power. I'm saying you may be telling it to shutdown when it has, by it's own calculations, 10 minutes of battery run time left. I believe the default is 3 or 5 in apcupsd.conf.
Oh, sorry, you weren't clear. Nope, apcupsd.conf is at its default of MINUTES 3.
mark
On Feb 8, 2013, at 12:26 PM, m.roth@5-cent.us wrote:
Toby Bluhm wrote:
On 2/8/2013 1:23 PM, m.roth@5-cent.us wrote:
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
You can specify in the conf file that shutdown occurs when it hits X minutes of runtime left - "MINUTES" should be the parameter. You're at 10.0 minutes left & if you have it set to something 10.0 or greater, it's probably gonna want to shutdown immediately at any AC power loss.
I'm afraid you've missed the whole beginning of this thread - I suggest you read it. I know what you were saying; it's the response of apcupsd to a power blip this morning that's the issue: as much as these servers draw, there's no way that the UPS is out of power in 3 seconds.
---- You can configure the anticipation of how many minutes are necessary for powering down because it's definitely not instantaneous and in fact, powering down is likely to cause an increase of power consumption.
That said, 55% load capacity is very high and obviously fits into the calculation that APCUPSD is making when it instructs connected server(s) to shut down.
Craig
Craig White wrote:
On Feb 8, 2013, at 12:26 PM, m.roth@5-cent.us wrote:
Toby Bluhm wrote:
On 2/8/2013 1:23 PM, m.roth@5-cent.us wrote:
Craig White wrote:
service apcupsd status
(or cat /var/log/apcupsd.events)
Already posted the latter; the former, hey, neat results, excerpted here: MODEL : Smart-UPS 3000 RM STATUS : SHUTTING DOWN LINEV : 118.0 Volts LOADPCT : 55.9 Percent Load Capacity BCHARGE : 100.0 Percent TIMELEFT : 10.0 Minutes <...> TONBATT : 0 seconds CUMONBATT: 27 seconds
<snip>
That said, 55% load capacity is very high and obviously fits into the calculation that APCUPSD is making when it instructs connected server(s) to shut down.
Heh. You think 55% is bad? with the 48 core servers, and esp. with the 64 core ones, when someone's running a job on the cluster, under no circumstances can I have more than three servers plugged into a 3000... and even then, I've seen it around 90%. I still don't believe that it would be run dry in 2-3 seconds.
And I'm still perturbed about it still telling me it's shutting down. Haven't figured out how to tell that to stop.
mark
From: "m.roth@5-cent.us" m.roth@5-cent.us
The entire contents of that incident. I see nothing in messages. 2013-02-07 17:38:19 -0500 Power failure. 2013-02-07 17:38:21 -0500 Battery power exhausted. 2013-02-07 17:38:21 -0500 Initiating system shutdown! 2013-02-07 17:38:21 -0500 User logins prohibited 2013-02-07 17:38:23 -0500 Power is back. UPS running on mains. 2013-02-07 17:38:23 -0500 Allowing logins
I guess you already tried the UPS self test or the apctest utility...? Did you do a battery recalibration? When all else fails... power off/on! ^_^
JD
Realizing that this thread is a bit old ...
I've used apcupsd for years on a variety of enterprise- and consumer-class UPSes, on a variety of UNIXes. When I have seen a system that reports long battery lifetime in its stats but shuts down immediately, these are the likely culprits in order of likelihood:
- one of the conf values was tuned down for initial testing and never turned up again. Another poster mentioned MINUTES but the one I usually use for testing and have forgotten to change is the TIMEOUT value; for testing I would often set it to 60 but in production it should (on my systems) be zero. There are a few other such parameters in the config file; review them all.
- one or more batteries in your chain is EOL. Shut down apcupsd and initiate the battery test mode. Batteries will typically last not more than 5 years (yes, there is variation both ways)
- your battery runtime is not calibrated and so the stats you're seeing are misleading. This can happen from not being calibrated to begin with, from not calibrating after replacing batteries, or having batteries deteriorate over time. See the apcupsd docs or mailing list for calibration details. Note that if your load is significantly less than your battery capacity (like 15% or similar), calibration will not in general work.
- your UPS is EOL (fairly rare, but I occasionally retire a UPS because it no longer behaves in a predictable manner despite new batteries)
If that doesn't solve it, I'd suggest taking it to the apcupsd mailing list.
One thing that *is* more on-topic for CentOS and apcupsd, be aware that system updates don't clobber the patched /etc/rc.d/init.d/halt script needed for apcupsd to actually shut down your system. I have a cron job on such systems that looks for the string 'apcupsd' in that file; if it doesn't exist, an email alert goes out to me so that I know that I have to re-patch it.
Devin
Am 16.02.2013 um 18:33 schrieb Devin Reade gdr@gno.org:
... One thing that *is* more on-topic for CentOS and apcupsd, be aware that system updates don't clobber the patched /etc/rc.d/init.d/halt script needed for apcupsd to actually shut down your system. I have a cron job on such systems that looks for the string 'apcupsd' in that file; if it doesn't exist, an email alert goes out to me so that I know that I have to re-patch it.
what about /sbin/halt.local this file could be used for such thing. (check /etc/rc.d/init.d/halt to see what ist done with /sbin/halt.local)
-- LF