Was just looking for quick and easy way to sort the files, which should
be compressed/archived using tar and bzip/gz/xz etc. As another guy
already figured out before I did, it's rather important in which order
the files are stored in an archive. I don't have a specific example
handy, but he claims that the difference in size between a sorted and an
unsorted tar.xz is about 20%
(source).
Now of just sorting the files by path/name, which I think is not the
most effective way, I sort it by the file's suffix (eg. 01_music.mp3
-> 'mp3' will be used as the criteria) instead, which should be even
more effective.
So next time you compress some huge amount of data, just use the lines below to save some space:
{.lang:sh .decode:true}
$ find /media/backup_xxx | awk -F '.' ' { print $NF"---"$0 } ' | sort | awk -F'---' ' { print $2}' >filelist-sorted.txt
$ tar -cv --no-recursion -T filelist-sorted.txt | xz -9 >backup_xxx.tar.xz
Cheers
Raphi