Some of our largest systems run Windows because it supports engineering applications that we use regularly. These applications have unattended runs that often take between ten and fifteen hours to complete. We have taken the recommendation of the application supplier and equipped these Windows machines with UPS protection for 30 minutes at full load.
The UPSs are Ethernet connected. A support application on the Windows engineering machine communicates with the UPS to detect and address any facility power failure. The long run engineering application is then suspended at a restart point and the system is shut down. We initiate job completion manually from the suspension restart point after the system has reliable power and is rebooted.
If we wanted to protect our CentOS systems from facility power failure in a similar way, is there operating system or other standard support that we might employ? Most of the Linux-based applications are not as critical as the engineering applications on the Windows machines. There is a significant amount of processor idle time on several of the CentOS systems during non-work hours when the systems are unattended. Several CentOS systems are supported currently with UPSs, but they run out and the system loses power if it is unattended.
On 8/8/2017 4:50 PM, Chris Olson wrote:
If we wanted to protect our CentOS systems from facility power failure in a similar way, is there operating system or other standard support that we might employ? Most of the Linux-based applications are not as critical as the engineering applications on the Windows machines. There is a significant amount of processor idle time on several of the CentOS systems during non-work hours when the systems are unattended. Several CentOS systems are supported currently with UPSs, but they run out and the system loses power if it is unattended.
NUT aka NetworkUPS Tools is the way to go on Linux, for UPS power management. you can configure NUT as master/slave, so one linux system talks with the UPS, either USB, Serial, or Ethernet, and it in turn tells the other linux systems to shut down cleanly when its time. Or if each box has its own UPS, you can just run NUT in standalone mode on each box.
NUT is in EPEL...
On Tue, August 8, 2017 6:50 pm, Chris Olson wrote:
Some of our largest systems run Windows because it supports engineering applications that we use regularly. These applications have unattended runs that often take between ten and fifteen hours to complete. We have taken the recommendation of the application supplier and equipped these Windows machines with UPS protection for 30 minutes at full load.
The UPSs are Ethernet connected. A support application on the Windows engineering machine communicates with the UPS to detect and address any facility power failure. The long run engineering application is then suspended at a restart point and the system is shut down. We initiate job completion manually from the suspension restart point after the system has reliable power and is rebooted.
If we wanted to protect our CentOS systems from facility power failure in a similar way, is there operating system or other standard support that we might employ? Most of the Linux-based applications are not as critical as the engineering applications on the Windows machines. There is a significant amount of processor idle time on several of the CentOS systems during non-work hours when the systems are unattended. Several CentOS systems are supported currently with UPSs, but they run out and the system loses power if it is unattended.
I used a lot APC smart UPSes. They have serial or USB connection through which some daemon you run on your machine (apcupsd) can detect UPS on battery, and can initiate clean system shutdown when battery is below some charge level (you define which in configuration). Apcupsd is free open source software, I never used APC's software that does the same. When power returns (if UPS fully drained its battery) system can be configured to boot on power restored. If you have more than one machine behind the same UPS, apcupsd daemons on other machines can run in "slave" mode and get information from master apcupsd.
Depending on your UPS make/model there may be similar daemon that can do the same.
Valeri
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
++++++++++++++++++++++++++++++++++++++++ Valeri Galtsev Sr System Administrator Department of Astronomy and Astrophysics Kavli Institute for Cosmological Physics University of Chicago Phone: 773-702-4247 ++++++++++++++++++++++++++++++++++++++++
On 08/08/17 19:50, Chris Olson wrote:
Some of our largest systems run Windows because it supports engineering applications that we use regularly. These applications have unattended runs that often take between ten and fifteen hours to complete. We have taken the recommendation of the application supplier and equipped these Windows machines with UPS protection for 30 minutes at full load.
The UPSs are Ethernet connected. A support application on the Windows engineering machine communicates with the UPS to detect and address any facility power failure. The long run engineering application is then suspended at a restart point and the system is shut down. We initiate job completion manually from the suspension restart point after the system has reliable power and is rebooted.
If we wanted to protect our CentOS systems from facility power failure in a similar way, is there operating system or other standard support that we might employ? Most of the Linux-based applications are not as critical as the engineering applications on the Windows machines. There is a significant amount of processor idle time on several of the CentOS systems during non-work hours when the systems are unattended. Several CentOS systems are supported currently with UPSs, but they run out and the system loses power if it is unattended. _______________________________________________
You didn't say what brand/model of UPS you are using so I can't be specific. Check with the manufacturer of your UPS to see if they have an application that can communicate power status with your CPU. Many UPS devices are capable of signaling power loss. The UPS can give you enough warning to initiate a graceful shutdown.
For example APC brand UPS devices. Many of them can connect to the CPU through either Ethernet, USB, or serial cable so they can send the bad news that the power is going down soon. Check with your UPS manufacturer first.
On 8/14/2017 4:01 PM, Mark LaPierre wrote:
You didn't say what brand/model of UPS you are using so I can't be specific. Check with the manufacturer of your UPS to see if they have an application that can communicate power status with your CPU. Many UPS devices are capable of signaling power loss. The UPS can give you enough warning to initiate a graceful shutdown.
For example APC brand UPS devices. Many of them can connect to the CPU through either Ethernet, USB, or serial cable so they can send the bad news that the power is going down soon. Check with your UPS manufacturer first.
NUT supports virtually *ALL* UPS's without messing with manufacturer proprietary software, and its in the EPEL repository, kept up to date.