hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Haidinyak <phaidin...@local.com>
Subject RE: Row Key Question
Date Fri, 22 Apr 2011 20:18:51 GMT
Thanks, that's the way I visualized it happening. Then the assumption is this process would
continue until every server in the cluster has on region of data (more or less). My underlying
question is that I need to store my data with the key starting with the date (YYYY-MM-DD).
I know this means I will have hot spots during inserts but make retrieval more efficient by
using a scan with start and end rows. I was thinking of adding a prefix number of 00 to 09,
for the ten servers. In theory, each server should only end up with one of the prefixes. Then
during retrieval I could the use ten Threads, each would use a Start and End row with their
prefix and the query should be distributed evenly out among the server. I'm not sure if using
ten Thread to insert the data would buy me anything or not. Anyway, I'm going to try this
out at home on my own cluster to see how it performs.

Thanks

-Pete

-----Original Message-----
From: Buttler, David [mailto:buttler1@llnl.gov] 
Sent: Friday, April 22, 2011 12:10 PM
To: user@hbase.apache.org
Subject: RE: Row Key Question

Regions split when they are larger than the configuration parameter region size.  Your data
is small enough to fit on a single region.

Keys are sorted in a region.  When a region splits the new regions are about half the size
of the original region, and contain half the key space each.

Dave

-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Friday, April 22, 2011 10:41 AM
To: user@hbase.apache.org
Subject: Row Key Question

I have a question on how HBase decides to save rows based on Row Keys. Say I have a million
rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte
value and there are two columns per row, each column contains some random 32 byte value. 
My question is how does HBase know when to 'split' the table between the ten nodes? Or how
does HBase 'split' the random keys between the ten nodes? 

Thanks

-Pete

Mime
View raw message