hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiss Tibor <kiss.ti...@gmail.com>
Subject Re: Hadoop for Bioinformatics
Date Tue, 29 Mar 2011 13:32:40 GMT
Hi Franco,

We are using Hadoop for next-gen sequence alignment.
Earlier we had a classic programming model solution, but currently we are
upgrading our software services to M/R modell based on Hadoop.
We transferred most of our classic algorithms to Hadoop and I can say that
everything is getting more manageable.

We are going with Hadoop on the cloud and/or on datacenter. Another
challenge, especially with cloud, how you are transferring the data, because
in bioinformatics the amount of data are usually very high.
Currently i am working on an open-source version of Amazon multipart upload
which will be available in the next release of
JClouds<http://code.google.com/p/jclouds/wiki/BlobStore>,
here are the starting
ideas<http://www.slideshare.net/jclouds/big-data-in-real-life-a-study-on-s3-multipart-uploads>and
also a sample
client app<https://github.com/jclouds/jclouds-examples/tree/master/blobstore-largeblob>
.
If you want to follow new results on
twitter<http://twitter.com/#%21/tiborkisstibor>,
you are invited. I plan to release a paper with results of the data transfer
operations based on this open-source approach.

Also, soon we are releasing the version of our cloud based service stack
which is fully based on Hadoop.

Tibor

On Mon, Mar 28, 2011 at 4:51 AM, Franco Nazareno
<franco.nazareno@gmail.com>wrote:

> Good day everyone!
>
>
>
> First, I want to congratulate the group for this wonderful project. It did
> open up new ideas and solutions in computing and technology-wise. I'm
> excited to learn more about it and discover possibilities using Hadoop and
> its components.
>
>
>
> Well I just want to ask this with regards to my study. Currently I'm
> studying my PhD course in Bioinformatics, and my question is that can you
> give me a (rough) idea if it's possible to use Hadoop cluster in achieving
> a
> DNA sequence alignment? My basic idea for this goes something like a string
> search out of a huge data files stored in HDFS, and the application uses
> MapReduce in searching and computing. As the Hadoop paradigm impies, it
> doesn't serve well in interactive applications, and I think this kind of
> searching is a "write-once, read-many" application.
>
>
>
> I hope you don't mind my question. And it'll be great hearing your comments
> or suggestions about this.
>
>
>
> Thanks and more power!
>
> Franco
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message