hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taylor, Ronald C" <ronald.tay...@pnnl.gov>
Subject RE: Powered By Page
Date Mon, 02 Jul 2012 21:16:12 GMT
Hi Stack,

Re Lustre use: I'm not a hardware infrastructure type of guy, but I can tell you that we have
a very fast interconnect for access into the global filesystem:

"The Olympus Infiniband topology is a combination of 2:1 oversubscribed 36 port leaf switches
and direct links into a 648 port core Qlogic QDR Infiniband switch."

I am not really worried about loss of data locality and slower speed of access to the Hbase
tables. That is, this is not (yet) a production environment for multiple users with real time
access. Though I think it would work - it's been quite stable, for one thing, and I have not
noticed any speed problem in retrieving records.  But I have not done any serious timings,
and currently we are not stressing Hbase, in that the warehouse is being used by a just a
few bioinformaticians, not the general community, so to speak. I'm happy to simply have the
data gathered in one place that provides scalability and for which I can easily write custom
analytics programs that I can build upon and that won't have to be moved to another database
framework down the line.

As the warehouse grows, I do plan on doing some testing, comparing HBase access using local
disk storage vs Lustre. But that's when I have more time, and the warehouse is large enough
for some real testing. We also have the option of putting *everything* into Lustre, both Hbase
tables and all temp HDFS file storage used by our Map Reduce programs. So - no local disk
use at all. I'm curious as to how well that would work. Possibly quite well, but no  testing
yet. Want to try that. It should be a pretty simple switch -  our olympus support people have
already constructed alternate starting points that load all the libs into Lustre instead of
each local disk), but got other more immediate work to do first.

BTW - the Dept of Energy's new five-year systems biology knowledgebase project - the largest
single bioinformatics project at DOE, I believe - is using Hadoop for several things in its
multiple backends. See http://kbase.science.energy.gov/. I believe that Michael Schatz at
Cold Spring Harbor Lab is heading up the Hadoop work, with clusters at Lawrence Berkeley,
Argonne Nat Lab, and Oak Ridge.  Not sure how HBase fits in -they are getting into some NoSQL
work, but not sure what they'll be using. HBase, I hope, but don't know.


Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568
email: ronald.taylor@pnnl.gov

-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Monday, July 02, 2012 1:37 PM
To: user@hbase.apache.org
Subject: Re: Powered By Page

On Mon, Jul 2, 2012 at 8:19 PM, Taylor, Ronald C <ronald.taylor@pnnl.gov> wrote:
> Pacific Northwest National Laboratory (www.pnl.gov) - Hadoop and HBase (Cloudera distribution)
are being used within PNNL's Computational Biology & Bioinformatics Group for a systems
biology data warehouse project that integrates high throughput proteomics and transcriptomics
data sets coming from instruments in the Environmental  Molecular Sciences Laboratory, a US
Department of Energy national user facility located at PNNL. The data sets are being merged
and annotated with other public genomics information in the data warehouse environment, with
Hadoop analysis programs operating on the annotated data in the HBase tables. This work is
hosted by olympus, a large PNNL institutional computing cluster (http://www.pnl.gov/news/release.aspx?id=908)
, with the HBase tables being stored in olympus's Lustre file system.

Thats a cool one.  I put it up (I put it in place of the powerset entry -- smile).

How's that Lustre hookup work Ronald?  You did your own FS implementation for it?

Good stuff,

View raw message