Monday 13 October 2014

GNU parallel

Assalamualaikum and good morning to all readers, i want to share my research about parallel computing. One of example of parallel computing is GNU Parallel.


GNU Parallel
For people who live life in the parallel lane.
GNU parallel is a shell tool for executing jobs in parallel using one or more computer. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel. If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.
GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
For each line of input GNU parallel will execute command with the line as arguments. If no command is given, the line of input is executed. Several lines will be run in parallel. GNU parallel can often be used as a substitute for  xargs or cat | bash.

 source= http://www.gnu.org/software/parallel/

 -----------------------------------------------------------------------------------------------------------------------
  more example about GNU parallel;

All new computers have multiple cores. Many bioinformatics tools are serial in nature and will therefore not use the multiple cores. However, many bioinformatics tasks (especially within NGS) are extremely parallelizeable:
  • Run the same program on many files
  • Run the same program on every sequence
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
Simple scheduling
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:

GNU Parallel scheduling


source= https://www.biostars.org/p/63816/


No comments:

Post a Comment