hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Newman" <posi...@gmail.com>
Subject RE: Scalability of HBase
Date Wed, 24 Sep 2008 15:53:33 GMT
That actually answers my questions. To answer your 27 petabyte
question, I would have to say maybe...


> Alex,

> So each row = 24 column-families(?) * 300,000,000 entries/family * ~40
> bytes/entry = about 270GB/row ?

> And that * 100,000 rows = about 27 petabytes of data?

> Is my math right here? :)

> With a big enough cluster, you might be able to get that amount of data in
> hadoop.  I'm not sure anyone has had an HBase installation that big.

> One thing that is definitely not going to work with HBase is having single
> rows that are many GBs.

> A row can never be split across regions, and the default region size is
> 256MB (though configurable), so you'd be 3 orders of magnitude greater than
> the recommended maximum.  So to directly answer your questions, one
> limitation is the size of a single row.  The other limitation is the number
> of regions that can be handled on each node.  The upper limits are in the
> 400-500 region / region-server range though this can vary depending on your
> hardware and usage patterns.  That's about 100GB on an HBase node, so if you
> were to get this much data into HBase you'd need several hundreds of
> servers.

> One thing you'd definitely need to do is rework your schema a bit, spreading
> things across more rows so you can have reasonably sized regions.

> My short answer would be that this is not currently possible in HBase unless
> you had a very very large cluster and a bit of time to work out some bugs
> that I'm sure will pop up with an installation of this size.  My question to
> you is, do you really need random access of this granularity to 27 petabytes
> of data?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message