hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luca Pireddu <pire...@crs4.it>
Subject Re: Hadoop for Bioinformatics
Date Tue, 29 Mar 2011 13:39:30 GMT
On March 28, 2011 04:51:14 Franco Nazareno wrote:
> Good day everyone!

And a good day to you Franco!

> First, I want to congratulate the group for this wonderful project. It did
> open up new ideas and solutions in computing and technology-wise. I'm
> excited to learn more about it and discover possibilities using Hadoop and
> its components.
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving
> a DNA sequence alignment? My basic idea for this goes something like a
> string search out of a huge data files stored in HDFS, and the application
> uses MapReduce in searching and computing. As the Hadoop paradigm impies,
> it doesn't serve well in interactive applications, and I think this kind
> of searching is a "write-once, read-many" application.
> I hope you don't mind my question. And it'll be great hearing your comments
> or suggestions about this.
> Thanks and more power!
> Franco

The short answer is yes!  At CRS4 we are working on this very problem.  

We have implemented a Hadoop-based workflow to perform short read alignment to 
support DNA sequencing activities in our lab.  Its alignment operation is 
based on (and therefore equivalent to) BWA.  We have written a paper about it 
which will appear in the coming months, and we are working on an open source 
release, but alas we haven't completed that task yet.  

We have also implemented a Hadoop-based distributed blast alignment program, 
in case you're working with long fragments.  It's currently being used by our 
collaborators to align viral DNA segments.

In either case, if you're interested we can let you have an advance release of 
either program so you can try them out.

Luca Pireddu
CRS4 - Distributed Computing Group
Loc. Pixina Manna Edificio 1
Pula 09010 (CA), Italy
Tel:  +39 0709250452

View raw message