hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@yahoo.com>
Subject Re: success story
Date Fri, 03 Oct 2008 00:28:22 GMT
Also I should mention for the sake of clarity that that raw 70TB capacity does not factor in
3x DFS replication, and we're putting a lot more than just HBase tables into DFS, but still
we'd like our HBase tables to grow very very large with Web content and other things. 

  - Andy


--- On Thu, 10/2/08, Andrew Purtell <apurtell@yahoo.com> wrote:

> From: Andrew Purtell <apurtell@yahoo.com>
> Subject: Re: success story
> To: hbase-user@hadoop.apache.org
> Date: Thursday, October 2, 2008, 5:23 PM
> Yes, typo, sorry. 512MB. 
> 
> Our node specification is approximately:
>   CPU: 2x 4-core Xeons @ 3GHz
>   RAM: 8GB
>   Disk: 1TB RAID-1 system volume, 4 1TB RAID-0 data volumes
> (for DFS)
> 
> I'm experimenting with mapfile size limits. We started
> low to get lots of splits early. I've increased it to
> 512MB most recently to slow splitting. We're above the
> concurrent map capacity of the cluster already. I may try to
> push the split threshold up to 1GB, but of course I have
> concerns about that. The goal is to make effective use of
> the ~70TB capacity of the cluster without blowing up the
> region count to the point where there aren't enough
> region servers to effectively carry it. 
> 
>    - Andy
> 
> --- On Thu, 10/2/08, Jean-Daniel Cryans
> <jdcryans@apache.org> wrote:
> 
> > From: Jean-Daniel Cryans <jdcryans@apache.org>
> > Subject: Re: success story
> > To: hbase-user@hadoop.apache.org
> > Date: Thursday, October 2, 2008, 4:47 PM
> > Andrew,
> > 
> > This is great!
> > 
> > Is it a typo or you really have some regions as big as
> > 250GB?
> > 
> > What kind of machines do you use?
> > 
> > Thx,
> > 
> > J-D
> > 
> > On Thu, Oct 2, 2008 at 7:11 PM, Andrew Purtell
> > <apurtell@apache.org> wrote:
> > 
> > > I just wanted to take this opportunity to report
> an
> > HBase success story.
> > >
> > > We are running Hadoop 0.18.1 and HBase 0.18.0.
> > >
> > > Our application is a web crawling application
> with concurrent
> > > batch content analysis of various kinds. All of
> the workflow
> > > components are implemented as subclasses of
> TableMap and/or
> > > TableReduce. (So yes there will be some minor
> refactoring
> > > necessary for 0.19...)
> > >
> > > We are now at ~300 regions, most of them 512GB,
> hosted on a
> > > cluster of 25 nodes. We see a constant rate of
> 2500
> > > requests/sec or greater, peaking periodically
> near 100K/sec
> > > when some of the batch scan tasks run. Since
> going into
> > > semi-production over last weekend there has been
> no downtime or
> > > service faults.
> > >
> > > Feel free to add "Trend Micro Advanced
> Threats
> > Research" to the powered by
> > > page.
> > >
> > >   - Andy
> > >
> > >
> > >
> > >
> > >


      

Mime
View raw message