Greer’s Third Law: A computer program does what you tell it to do, not what you want it to do.
I’ve been around Linux long enough to know that the fabled Linux command:
rm -Rf *
is dangerous, and I have to say I’ve never lost data by using that one wrong. But there are other, more subtle ways to lose data if you’re not careful, and today I’m going to share a tale of this.
Let me set the stage for you: I have a huge disk*) that I want to migrate into a NAS I just bought. The disk is NTFS format and the NAS can’t read that, so it’ll be a roundabout process:
- put a bunch of old drives into the NAS,
(they aren’t big enough for me to set up the final volume straight away) - copy everything from the 6TB disk to the bunch of disks,
- put the 6TB into the NAS and set it up as the NAS volume I want,
- copy everything from the bunch of disks into the NAS volume,
- maybe add the bunch of disks to the NAS volume. Regardless, I’m going to buy a second 6TB drive soon.
Sounds like a simple plan – but I have two challenges along the way. One is that there’s a lot of redundant backups on the 6TB drive, so it would be nice to de-duplicate that first. Second, copying >10 million small files over the network into the NAS is an incredibly slow process (my network is gigabit so that’s not the bottleneck), so this is going to take days to complete.
So my revised plan is to start the copying process in the background because that’s going to take forever anyway. At the same time, I’m working on merging and de-duplicating my backups. My intention is to later use the mirror function of my trusty old Windows robocopy
command that I know very well – this will remove anything on the target that isn’t in the source anymore.
This would have worked. But I was impatient, and eager to get on with it. I think you know where this is going.
Curious me had the idea that instead of pushing the files from Windows with robocopy
, I could pull the files from Linux instead, using the rsync
command. I tested that with a limited data set and it seemed this was much faster than robocopy
. Here’s what I did:
- create a mount point in Linux for the Windows share:
mkdir /volume1/my-jbod/windows-e-drive
- mount the Windows share in Linux,
sudo mount.cifs //[source-ip]/E$ /volume1/my-jbod/windows-e-drive -o user=[windowsusername]
- use
rsync
to pull the data from Windows:
sudo rsync -achivutz --progress --delete --force /volume1/my-jbod/windows-e/ /volume1/my-jbod
- I added the –delete switch that does the same as the mirror switch in
robocopy
.
- I added the –delete switch that does the same as the mirror switch in
Can you tell what the problem is?
At first I thought that the –delete switch was buggy, because it seemed that it would delete from the source rather from the target. Surely that can’t be right? A quick test and nope, that works as I thought it should. So what’s going on?
I soon discovered the problem: It’s my use of mkdir
! I had actually mounted the source inside the target, and that’s bad! When rsync --delete
sees the directory /volume1/my-jbod/windows-e-drive
, it determines that the source does not contain this (it can’t; it would have to recursively contain itself) and promptly removes the target that’s not in the source – which is mapped to my source! Oops.
rsync
is fast so it managed to delete some stuff before I managed to shut it down. The good news is that it might not be lost, because I had started that background copying job in the beginning. It just means that I have to do the merging and de-duplicating of my backups when I’m done with the copying.
And if I really lost data? It was just old data and I’m probably not going to miss it anyway. ¯\_(?)_/¯
*) Note to future self: by 2018 standards, 6TB was considered pretty damn large.