The family tree is back online! On slightly different hardware, and with a back-up system which is (almost) working!
A few weeks ago the family tree went down when its storage media failed without warning and catastrophically. According to the services we contacted it is not recoverable. That was terrible, but nearly as bad was discovering the backup system had stopped working more than two years earlier. It had quietly been over-writing all the backup files, for two years! with error messages saying it could not reach the server.
We did have a scattering of manual backups; copies of the GED file to work in a desktop software, nearly-current archives of the documents and images folder, etc. The GED file was from February, so we have lost 6 months of work. But this is not the same as a backup - we need to manually locate each of the thousands of documents and images, upload it, and reconnect it to each of the records which may use it. For example, a census event for a family of 8 may need to be connected to 24+ records - each individual will have a census event, and then 2-3 other events for which it also serves as evidence such as their residence, their occupation, their birth date, and so on.
It is a large pile of work to accomplish. About 250 days of full-time work, according to some back-of-the-envelope calculations.
But even as that is slowly happening, we are also setting up the new back-up system. We have a four-pronged multi-redundancy model we are implementing.
- The first a simple copy-and-restore model to take a daily 'snapshot' of the directories and files which make up the website, and a copy of the database. For the copy we are using rdiff-backup, which actually only copies things which have changed - a new file added or edited, a directory whose permissions have been altered, etc. Copying the database is a bit more complex as there are issues of timing and permissions on the remote storage which need to be worked out.
- The second prong is a regularly scheduled disc image. This is similar to the copy-and-restore, except it does not have the ability to 'go back in time' to previous versions unless you do it really often; we plan to keep at most 3 copies taken once per week. Due to some issues about a server taking an image of itself while it is running, which I am working on, this currently is a manual process. You may run into a "server is down for archiving" web page - it takes about 20 minutes because I am still learning how do this quickly.
- Third approach is another manual one involving exporting a complete GED + media archive. This is going to a third storage unit. There may be an API for doing this which I will be researching soon; if so this will be a lot easier in the future.
- The fourth approach will need to wait a few weeks while I acquire hardware. This involves creating a mini cluster of servers where one provides the media files, one provides the database, and two or more provide the web front-end. This may provide a slight improvement in performance, but probably not. It does, however, make it easier to have the static files server live-updating a backup, and ditto with the database server.
The important message is: the website is up, and likely to stay up.