Monday, January 2, 2012

Since I'm bored, and unemployed for the time being, I decided to attack my music collection. More specifically the duplicates in it.

I first must put in a plug for MusicBrainz Picard. It is a wonderful free application that identifies your tracks based on signatures. Whether you are using Windows, Mac or Linux, you should use this. You will not be disappointed.

Here are a few of the methods I used to weed out the dups:

Remove Direct Copies (Slow)

For this, less dups were removed, but I did have the satisfaction of knowing that I didn't have identical data. How I did that was finding directories that had "filename ([1-9]).mp3" in it and compare all of the files based on their MD5 signatures.

The script:

#!/bin/bash #find_hash_matches.sh
old_IFS=$IFS
export IFS="
"
for h in + 0 1 2 3 4 5 6 7 8 9 _ a b c d e f g h i j k l m n o p q r s t u v w x y z U V W X Y Z D E F G H I J K L M N O P Q A B C R S T
do
echo $h
find /path/to/the/rock/$h* -name "*.mp3" -exec md5sum {} \; >> whole && chmod 777 whole


awk -F"  " '{print $1}' whole|sort|uniq -d >> dups && chmod 777 dups


for i in `cat dups`
do
for j in `grep $i whole`
do
the_file=`echo $j | awk -F'  ' '{print $2}'`
#echo $the_file >> dup_log
#echo $the_file
if [ `echo $the_file | grep "([0-9])"` ]
then
if [ -f "$the_file" ]
then
rm "$the_file"
echo $the_file
echo $the_file >> dup_log
fi
fi
done
done
rm whole dups
done

Remove Low Quality and Truncated Tracks (Quick)

This method compares only tracks with the same base name. It keeps a file only if it has the highest bitrate and the longest duration. This is dependant on the program "mp3info". Check your distro's repos for availability.

#!/bin/bash
old_IFS=$IFS
export IFS="
"
rm more_dups_log quality_comparison 2> /dev/null


deleted=0
basedir=/path/to/the/rock


for i in `seq 1 9`
do
find $basedir -name "*($i)*" >> more_dups_log 2> /dev/null
done


for i in `cat more_dups_log | sed 's/ ([0-9])\.mp3//g' | sort | uniq`
do
artist=`echo $i | awk -F'/' '{print $5}' `
album=`echo $i | awk -F'/' '{print $6}' `
title=`echo $i | awk -F'/' '{print $7}' `
dir="$artist/$album/"
for j in `find $basedir/$dir -name "$title*"`
do
info=`mp3info -r m -p "%r   %S"  "$j"`
echo "$j   $info">>quality_comparison
done

best_br=0
best_length=0
for j in `grep $i quality_comparison`
do
br=`echo $j | awk -F'   ' '{print $2}'`
length=`echo $j | awk -F'   ' '{print $3}'`
if [ $br -gt $best_br ]
then
let best_br=$br
fi

if [ $length -gt $best_length ]
then
let best_length=$length
fi
done



if [ `grep "$best_br   $best_length" quality_comparison | wc -l` -le 1 ]
then
for j in `grep $i quality_comparison`
do
if [[ "$j" != `grep "$best_br   $best_length" quality_comparison` ]]
then
file_to_remove=`echo $j |  awk -F'   ' '{print $1}'`
rm "$file_to_remove" 2> /dev/null
let deleted=$deleted+1
fi
done
else
for j in `grep $i quality_comparison`
do
the_br=`echo $j | awk -F'   ' '{print $2}'`
the_length=`echo $j | awk -F'   ' '{print $3}'`
if [[ $the_br != $best_br && $the_length != $best_length ]]
then
file_to_remove=`echo $j |  awk -F'   ' '{print $1}'`
rm "$file_to_remove" 2> /dev/null
let deleted=$deleted+1
fi
done
lines=`grep "$best_br   $best_length" quality_comparison | wc -l`
the_count=1
for j in `grep $i quality_comparison`
do
file_to_remove=`echo $j |  awk -F'   ' '{print $1}'`

if [ $the_count -lt $lines ]
then
rm "$file_to_remove" 2> /dev/null
let deleted=$deleted+1
let the_count=$the_count+1
fi
done
fi


if [[ `grep "([1-9])" quality_comparison` ]]
then
for j in `grep "([1-9])" quality_comparison`
do
file_to_move=`echo $j |  awk -F'   ' '{print $1}'`
if [ -f "$file_to_move" ]
then
mv "$file_to_move" "`echo $file_to_move | sed 's/ ([1-9])//g'`"
fi
done
fi
rm quality_comparison 2> /dev/null
echo $i
done
chmod 777 -R "$basedir"
echo "$deleted files have been deleted."

You will still have some dups left but this will delete A LOT of them. A simple `find -name "*([1-9])*"` will find all that remains. Protip: if you don't really care about the tracks you find with that last command amend it to be `find -name "*([1-9])*" -exec rm {} \;` to remove every match.

If you have any other automated methods, please share.
Reactions:

0 comments:

Post a Comment

Subscribe to RSS Feed Follow me on Twitter!