In the age of multi-core CPU's, there's also a need for properly
optimized application. Since the vast majority of the linux utilities
only allow you to specify ONE input/output file, it's our job to script
it to exhaust all the Cores.
When I got my first multi-core CPU about one year back, I was not aware
of any method to launch several threads except for the ampersand-thingy
(like 'echo &') so I could launch them in the background. The problem of
course is that there's not an easy way to limit the number of concurrent
threads, so we need to be more creative.
Method 1 - Bash script
So this was the first method I've put together, based on some other snippets I found in the web. As you can see, it needs quite a few lines of code to get some simple multithreading, which is only useful for scripts, but is not handy for your daily use. Anyway, here it is:
[code lang="bash"]
#!/bin/bash
THREADDIR=\$(mktemp -d)
MAXTHREADS=4
PID=1
for input in *.JPG; do
while [ -f \$THREADDIR/\$PID ] ; do
if [ \$PID -eq \$MAXTHREADS ] ; then
PID=0
fi
sleep 0.1
PID=\$[\$PID+1]
done
touch "\$THREADDIR/\$PID"
{
output=\$(echo \$input | sed 's/JPG/TIF/g')
convert \$input \$output
rm "\$THREADDIR/\$PID"
} &
done
[/code]
Method 2 - Xargs
This was my favorite method for a long time... It's using xargs and comes shipped with 'find'.
[code lang="bash"]
#!/bin/bash
find ./ -name "*.JPG" -print0 | xargs -0 -n1 -P4 -I{} convert "{}"
"converted/{}.tif"
[/code]
Method 3 - GNU Parallel
So there an alternative to xargs, which is called GNU
Parallel. Even though it already
exists since years (thanks Ole Tange for clearing things up), I just
heard of it a few weeks ago. Seems like the 'GNU' adoption did the job
;-) It is actually much more powerful than xargs. One of the coolest
features is to run the job on other machines using SSH, so you're not
limited to your box, but instead use your idling machines and let them
also do some work.
This one is a really basic example to show you the syntax (nearly the
same as xargs):
[code lang="bash"]
#!/bin/bash
find ./ -name "*.JPG" -print0 | parallel -0 -j +0 convert {}
converted/{.}.tif
[/code]
Now we can also strip the 'find' part away and use the shell's built-in
function to search for files:
[code lang="bash"]
#!/bin/bash
parallel -j +0 convert {} converted/{.}.tif ::: *.JPG
[/code]
And finally one that's spread across some machines (includes localhost
using :). Please note, ssh and rsync needs to be installed on the other
machines.
[code lang="bash"]
#!/bin/bash
parallel --sshlogin :,somemachine.domain.com -j +0 --transfer --cleanup
--return {.}_2.jpg convert -scale 320 {} {.}_2.jpg ::: *.JPG
[/code]
Further links
parallel basic usage
(Youtube)
man parallel - contains lots of
examples/explanations
2010.08.06 - Updated the commandlines according to Ole Tange and added a link to youtube, explaining the basic usage of GNU Parallel.
Thanks for reading! Cheers,
Raphi