Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 30824 invoked from network); 29 Mar 2011 13:39:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 29 Mar 2011 13:39:40 -0000 Received: (qmail 48246 invoked by uid 500); 29 Mar 2011 13:39:37 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 48196 invoked by uid 500); 29 Mar 2011 13:39:37 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 48188 invoked by uid 99); 29 Mar 2011 13:39:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 13:39:37 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [156.148.72.33] (HELO raffaello.crs4.it) (156.148.72.33) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Mar 2011 13:39:29 +0000 Received: from slynx.localnet (slynx.crs4.it [156.148.72.124]) by raffaello.crs4.it (Postfix) with ESMTP id A679F790196 for ; Tue, 29 Mar 2011 15:39:06 +0200 (CEST) From: Luca Pireddu Organization: CRS4 To: common-user@hadoop.apache.org Subject: Re: Hadoop for Bioinformatics Date: Tue, 29 Mar 2011 15:39:30 +0200 User-Agent: KMail/1.13.5 (Linux/2.6.35-27-generic; KDE/4.5.1; x86_64; ; ) References: <4d8ff763.4323e70a.0871.ffffabd8@mx.google.com> In-Reply-To: <4d8ff763.4323e70a.0871.ffffabd8@mx.google.com> MIME-Version: 1.0 Content-Type: Text/Plain; charset="iso-8859-15" Content-Transfer-Encoding: 7bit Message-Id: <201103291539.30653.pireddu@crs4.it> On March 28, 2011 04:51:14 Franco Nazareno wrote: > Good day everyone! And a good day to you Franco! > First, I want to congratulate the group for this wonderful project. It did > open up new ideas and solutions in computing and technology-wise. I'm > excited to learn more about it and discover possibilities using Hadoop and > its components. > > > Well I just want to ask this with regards to my study. Currently I'm > studying my PhD course in Bioinformatics, and my question is that can you > give me a (rough) idea if it's possible to use Hadoop cluster in achieving > a DNA sequence alignment? My basic idea for this goes something like a > string search out of a huge data files stored in HDFS, and the application > uses MapReduce in searching and computing. As the Hadoop paradigm impies, > it doesn't serve well in interactive applications, and I think this kind > of searching is a "write-once, read-many" application. > > > > I hope you don't mind my question. And it'll be great hearing your comments > or suggestions about this. > > > > Thanks and more power! > > Franco The short answer is yes! At CRS4 we are working on this very problem. We have implemented a Hadoop-based workflow to perform short read alignment to support DNA sequencing activities in our lab. Its alignment operation is based on (and therefore equivalent to) BWA. We have written a paper about it which will appear in the coming months, and we are working on an open source release, but alas we haven't completed that task yet. We have also implemented a Hadoop-based distributed blast alignment program, in case you're working with long fragments. It's currently being used by our collaborators to align viral DNA segments. In either case, if you're interested we can let you have an advance release of either program so you can try them out. -- Luca Pireddu CRS4 - Distributed Computing Group Loc. Pixina Manna Edificio 1 Pula 09010 (CA), Italy Tel: +39 0709250452