Hi,
Last week I had a disaster which took me a few unnerving days to repair. My main Internet-facing server is a bare-metal installation with CentOS 7. It hosts four dozen web sites (or web applications) based on WordPress, Dolibarr, OwnCloud, GEPI, and quite a number of mail accounts for ten different domains. On sunday afternoon this machine had a hardware failure and proved to be unrecoverable.
The good news is, I always have backups of everything. In that case, I have a dedicated backup server (in a different datacenter in a different country). I’m using Rsnapshot for incremental backups, so I had all data: websites, mail accounts, database dumps, configurations, etc.
Now here’s the problem: it took me three and a half days of intense work to restore everything and get everything running again. Three and a half days of downtime is quite a stretch.
As far as I understand, my mistake was to use a bare-metal installation and not a virtualized solution where I could simply restore a snapshot of a VM. Correct me if I’m wrong.
Now I’m doing a lot of thinking and searching. Proxmox and Ceph look quite promising. From what I can tell, the idea is not to use a big server but a cluster of many small servers, and aggregate them like you would do with hard disks in a RAID 10 array for example, only you would do this for the whole system. And then install one or several CentOS 7 VMs on top of this setup.
Any advice from the pros before I dive head first into the documentation?
Cheers from the sunny South of France,
Niki