hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Gilbert - SQLstream <g...@sqlstream.com>
Subject RE: Row Key Question
Date Wed, 16 Feb 2011 04:25:19 GMT
I've been considering a slightly different scenario.

In this scenario I'd hash the column qualifier and mod by some constant and
append the result to the rowkey. The idea is to spread the writes for a
specific rowkey among the various regions.  Mod by the constant gives
control over how many ranges would exist.  It assumes that all column
qualifiers are equally possible.

Depending on the app, scans could either  (1) append x00's or xff's to a
specific rowkey to gather all the columns for a specific key or (2)
enumerate the rowkey values and do a merge of all the x00's x01's x02's etc.
depending on requirements.

Any thoughts?

-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Tuesday, February 15, 2011 6:38 PM
To: user@hbase.apache.org
Subject: Row Key Question

Hi All,
  A couple of weeks ago I asked about how to distribute my rows across the
servers if the key always starts with the date in the format...


I believe Stack, although I could be wrong, suggested pre-pending a 'X-'
when 'X' is a number from 1 to the number of servers I have. This way a scan
can be threaded out where there is one thread per server and each thread
'owns' one 'X-' range of the keys. 
My question is on the import side, should I have one thread per server and
round-robin each line of our log files to the threads for the 'put' to the
server? Does this buy me anymore throughput?

Thanks again.


View raw message