Home > Expect/TCL, IT Architecture, Programming, SHELL > remove duplicate images using fdupes and expect in linux

remove duplicate images using fdupes and expect in linux

December 13th, 2013

I've got several thousands of pictures, but most of them had several exact copies of themselves. So I had to remove duplicate ones by hand firstly.

Later, I thought of that in linux we had md5sum which will give the same string for files with exact same contents. Then I tried to write some program, and that toke me some while.

I searched google and found that in linux, we had fdupes which can do the job very well. fdupes will calculate duplicate files based on file size/md5 value, and will prompt you to reserve one copy or all copies of the duplicates and remove others if you gave -d parameter to it. You can read more about fdupes here http://linux.die.net/man/1/fdupes

As all the pictures were on a windows machine, so I installed cygwin and installed fdupes and expect. Later I wrote a small script to reserve only one copy of the duplicate pictures for me(you will have to enter your option either reserving one copy or all copies by hand if you do not use expect, as there's no option for reserve one copy by the author of fdupes). Here's my program:

$ cat fdupes.expect
#!/usr/bin/expect
set timeout 1000000
spawn /home/andy/fdupes.sh
expect "preserve files" {
send "1\r";exp_continue
}

$ cat /home/andy/fdupes.sh
fdupes.exe -d /cygdrive/d/pictures #yup, my pictures are all on this directory on windows, i.e. d:\pictures

After this, you can just run fdupes.expect, and it will reserve only one copy and remove other duplicates for you.

PS: Here's man page of fdupes https://github.com/adrianlopezroche/fdupes

Good Luck!


Comments are closed.