hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Pireddu <pire...@crs4.it>
Subject Announcing Seal 0.1.0: BWA alignment on Hadoop
Date Mon, 09 May 2011 10:47:57 GMT
Hello everyone.  If you're working on short DNA read alignment, then you may 
be interested in this message.

We've just released Seal (http://biodoop-seal.sourceforge.net/), a Hadoop-
based distributed short read alignment and analysis toolkit.  Currently SEAL 
includes tools for:  read alignment (based on BWA), duplicate read removal, 
and sorting read mappings. SEAL scales, easily handling TB of data.  If you’re 
aligning read data sets of more than a couple of hundred MB, and you have a 
cluster of computers (even a small one, say 4 or 5 nodes, and up to hundreds 
of nodes) then Seal might be for you.

On a 16-node Hadoop cluster, with 8 cores and 16 GB of RAM per node, we have 
measured map+rmdup throughputs of 13 Gbp / hour, and 19 Gbp / hour in map-only 
mode.  Scalability tests show that the throughput per node is maintained as 
the number of nodes increases through to 128.

We have been working on Seal to support the needs of the CRS4 Sequencing 
laboratory, which operates 5 Illumina sequencing machines and thus generates 
lots of data to process.  The regular workflow was being overwhelmed 
notwithstanding the increased number of computers made available and was 
regularly overloading our Lustre shared storage volume.  Now all 
data processing at the lab starts with Seal, with very positive results with 
respect to speed and maintenance effort. 

We're eager to get people to try our new tool.  Please visit the Seal web site 
(http://biodoop-seal.sourceforge.net/) and feel free to contact myself or the 
other Seal authors if you have any question or problems.

Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452

View raw message