By Sally Burn. PhD
Last Halloween I wrote a Scizzle piece on lab nightmares; the first terror I dealt with was “Losing your data or samples”. Well, dear reader, I have to report that this nightmare became a reality for me a few weeks ago: I lost all my data. Four years and 400 GB, gone. And it happened with a single click of the mouse button.
Game of Thrones was also involved to an extent. But try as I might to throw blame at Joffrey and co., the main responsibility lies with my human error. Here’s how it happened: I have an external drive onto which I backup all data from my lab PC (via daily automatic backup) and microscopes (manually) into a folder rather unimaginatively called “Data”. There is also a redundant lab meetings folder sat just next to Data. In a rush to finish up what I was working on and free up my laptop for Game of Thrones I deleted what I believed to be the redundant folder, clicked “Yes” when warned it was too big for the recycle bin, briefly wondered why the deletion was taking so long, then finally settled in for some purple wedding action. Next morning, 24 hours before I’m due to give lab meeting, I go to retrieve some images from my drive. Only Data is no longer there. Some mild cold sweats kick in but I know that there’s a straight forward explanation, right? I must have dragged the folder into another folder. Only I can’t spot it anywhere… and that’s when I notice that my drive has 700 GB free instead of the usual 300. Cue draining of all color, mild sicking up in mouth, and incoherent babbling to lab mates.
How could this possibly happen, especially to me – a known anal retentive? It’s at this juncture I should point out that everything seems to be okay now and the situation was not as dire as it could have been – thanks in no small part to my anxious nature. Three weeks prior to Datageddon I’d taken a flight. Obviously this meant there was a strong chance of me dying in an aviation incident, plus being out of the lab somehow also increased the likelihood of there being a fire or maybe even just the building falling down. So I did one of my not-quite-routine backups to my home drive. The loss was therefore only three weeks’ worth of data. There was no new raw data generated in those three weeks but I had spent an inordinate amount of time converting the data into images, movies, and reconstructions – it was these that were lost.
It was beyond awful. So in an attempt to save my fellow scientists from a similar fate, here is a rundown of what I have learned and what you can do to protect your data:
You lost your data… now what?
My data loss was followed by the most mind-numbing two weeks of my life. I downloaded file retrieval software and retrieved 550 GB of deleted files. The retrieval took two days, recovering 300,000 files… which were all placed in the same folder, all details of their original location lost. Now I don’t know if you’ve ever tried to open a folder containing 300,000 files, ranging from 1kb to 35 GB in size, but let me tell you: it takes a LONG time and the average PC cannot handle ordering the files by date. I transferred operations to the fastest microscope PC and so began a week in a darkened ‘scope room, waiting hours for the folder to open and then slowly, laboriously attempting to transfer large handfuls of files (many duplicates or partial copies) into more manageable sub folders, such that I could look at and order the files by date. I got there in the end, retrieving the relatively few files I needed to, but ultimately it ended up taking me longer than it would have done to just reprocess the data from scratch.
As tedious as it was, file recovery software is your friend in this situation. If your files were too big for the recycle bin and you did an outright delete, this is your only option. I used Recuva, a free and easy to use program. You will need a second drive to write the retrieved files to. Try not to access (write to) the deleted disk before you start the recovery – the files are probably still in there somewhere but this may not be the case if you write fresh data to the drive. The process was slow on my geriatric PC and manual sorting through the files was even slower; I cannot even comprehend how long it would have taken me to sift through the retrieved data had I needed it all back. Which is why I cannot emphasize enough: prevention is better than cure – BACKUP!
There are a number of methods you can use to backup your data. Here is a rundown of a few, in order of reliability, starting with the least dependable:
My hitherto method of choice; this also seems to be a popular choice among my peers. This technique relies on you arbitrarily remembering to bring another drive into the lab to backup to. It’s better than nothing, but barely, like fighting an angry tiger with only a spoon for protection.
Automatic backup software:
OK, now we’re getting a little more reliable. Most external drives come with backup software installed. I have automatic backup from my lab PC to my external drive; unfortunately my PC data constitutes an insignificant subset of my overall data footprint. My take-home message from my experience is that you need to backup up all the drives you use, including microscope computers. Which raises the question: who pays for that? It would make sense that the PI or department arranges for multi-user drives to be backed up automatically; unfortunately this is not the case in the labs of a number of scientists I quizzed. It seems that “each person for themselves” is an unfortunately common tenet in academia.
At the suggestion of my PI, post-Datageddon, I paid $100 out of my own pocket for cloud storage. He recommended Carbonite, which offers unlimited storage. There are obviously other systems available, but thus far I have no issues with Carbonite. The $99.99/year plan allows for backup of all the internal drives plus one external drive; you can also create a mirror image of your system in case your computer needs totally reinstalling. The initial upload of my approximately 500GB of data took a week (possibly due to my subpar PC and internet connection) but since then it’s been ticking along nicely, backing up any changes in the background. If I delete a file and then realize I need it I have a 30 day window in which to retrieve it before the deleted file is removed from their server. If you work with clinical data and need HIPAA-compliant data storage there is also a package for that, retailing at $269.99/year. Data can be accessed from anywhere in the world, which could be a great benefit when away at conferences.
As I mentioned earlier, a common experience among those I talked to was that there was no central backup provided by their PI or department. Whether this is the norm in universities, the USA, or just in the labs of scientists I talked to is unclear. In my previous lab, in a research institute in the UK, all data drives and microscope computers were backed up to an on-site server every night; copies were maintained for a set time period and off-site backups were also regularly performed. The combined on-site and off-site server approach seems to be the gold standard as far as I can see, protecting even against loss due to building damage. However, even a single on-site server is a great idea. So perhaps float the idea next time your PI has grant money earmarked for purchasing equipment. Don’t think they’ll accept it as a reasonable expense? Try working out how much it will cost to repeat your experiments and replace lost data. As a ballpark figure, I calculated what it costs for me, a fourth year postdoc, to run an overnight live imaging organ culture on a multiphoton confocal microscope. My calculation takes into account my wages for time setting up, running, and analyzing this experiment; it also includes the cost of breeding and maintaining transgenic mice for three months leading up to the experiment in order to get the tissues I need, plus lab consumables (culture media, plates, etc.). To run this experiment and hopefully generate a single movie for use in the supplementary material of a paper, I’ve calculated that my PI pays around $2,431. No, really. Maybe that server doesn’t look so pricey now…?
Whatever data protection route you choose, remember that good anti-virus software is also a necessity for protecting your data. Talk to your PI/department to see if there are any provided backup resources. And if you yourself are a PI, come up with a data protection plan and make sure your employees know about it. It may save you a lot of stress and money further down the line.