Bash - Multithreading

In the age of multi-core CPU's, there's also a need for properly optimized application. Since the vast majority of the linux utilities only allow you to specify ONE input/output file, it's our job to script it to exhaust all the Cores.
When I got my first multi-core CPU about one year back, I was not aware of any method to launch several threads except for the ampersand-thingy (like 'echo &') so I could launch them in the background. The problem of course is that there's not an easy way to limit the number of concurrent threads, so we need to be more creative.

Method 1 - Bash script

So this was the first method I've put together, based on some other snippets I found in the web. As you can see, it needs quite a few lines of code to get some simple multithreading, which is only useful for scripts, but is not handy for your daily use. Anyway, here it is:

[code lang="bash"]
#!/bin/bash
THREADDIR=\$(mktemp -d)
MAXTHREADS=4
PID=1

for input in *.JPG; do
while [ -f \$THREADDIR/\$PID ] ; do
if [ \$PID -eq \$MAXTHREADS ] ; then
PID=0
fi
sleep 0.1
PID=\$[\$PID+1]
done

touch "\$THREADDIR/\$PID"
{
output=\$(echo \$input | sed 's/JPG/TIF/g')
convert \$input \$output
rm "\$THREADDIR/\$PID"
} &
done
[/code]

Method 2 - Xargs

This was my favorite method for a long time... It's using xargs and comes shipped with 'find'.

[code lang="bash"]
#!/bin/bash
find ./ -name "*.JPG" -print0 | xargs -0 -n1 -P4 -I{} convert "{}" "converted/{}.tif"
[/code]

Method 3 - GNU Parallel

So there an alternative to xargs, which is called GNU Parallel. Even though it already exists since years (thanks Ole Tange for clearing things up), I just heard of it a few weeks ago. Seems like the 'GNU' adoption did the job ;-) It is actually much more powerful than xargs. One of the coolest features is to run the job on other machines using SSH, so you're not limited to your box, but instead use your idling machines and let them also do some work.
This one is a really basic example to show you the syntax (nearly the same as xargs):
[code lang="bash"]
#!/bin/bash
find ./ -name "*.JPG" -print0 | parallel -0 -j +0 convert {} converted/{.}.tif
[/code]

Now we can also strip the 'find' part away and use the shell's built-in function to search for files:
[code lang="bash"]
#!/bin/bash
parallel -j +0 convert {} converted/{.}.tif ::: *.JPG
[/code]

And finally one that's spread across some machines (includes localhost using :). Please note, ssh and rsync needs to be installed on the other machines.
[code lang="bash"]
#!/bin/bash
parallel --sshlogin :,somemachine.domain.com -j +0 --transfer --cleanup --return {.}_2.jpg convert -scale 320 {} {.}_2.jpg ::: *.JPG
[/code]

Further links

parallel basic usage (Youtube)
man parallel - contains lots of examples/explanations

2010.08.06 - Updated the commandlines according to Ole Tange and added a link to youtube, explaining the basic usage of GNU Parallel.

Thanks for reading! Cheers,
Raphi