hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Seigal <selek...@yahoo.com>
Subject Re: pre splitting tables
Date Mon, 24 Oct 2011 19:17:57 GMT
Hi Stack,


>> According to the HBase book , pre splitting tables and doing manual
>> splits is a better long term strategy than letting HBase handle it.
> Its good for getting a table off the ground, yes.
>> Since I do not know what the keys from the prod system are going to
>> look like , I am adding a machine number prefix to the the row keys
>> and pre splitting the tables  based on the prefix (prefix 0 goes to
>> machine A, prefix 1 goes to machine b etc).
> You don't need to do inorder scan of the data?  Whats the rest of your
> row key look like?

I need to do be able to do this on 5-6 types of keys/dimensions.
I have a map reduce job that runs periodically and creates the indexes
on separate tables
for querying the data.

>> Once I decide to add more machines, I can always do a rolling split
>> and add more prefixes.
> Yes.
>> Is this a good strategy for pre splitting the tables ?
> So, you'll start out with one region per server?
> What do you think the rate of splitting will be like?  Are you using
> default region size or have you bumped this up?

This prefix strategy should I think create one region per region server.
I have configured a single region size to 2 G right now. This is just
the number I picked.

This is a small cluster as a proof of concept running in parallel with
some of the other
monolithic reporting infrastructures we have, and will only be serving
 a fraction of the
prod traffic to start off with.

The machines on the cluster look like - 120 GB of disk space ; 8 GB of memory ;
Quad core 2.66 Ghz . I am going to allocate around 80 GB of memory for
HBase use.

On a side note, I don't think I understand how to really decide how
many regions / region server
do I need.

If I was to create one region / region server and set
hbase.hregion.max.filesize to Long.MAX, why is that
a bad thing ? What kind of problems can I run into ? If I was to err
on the side of
too many regions , what are the advantages/disadvantages there ?

> St.Ack

View raw message