hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Xiao <xiaotao.cs....@gmail.com>
Subject How to get specified rows and avoid full table scanning?
Date Mon, 21 Apr 2014 15:04:05 GMT
I have a big table and rows will be added to this table each day. I wanna
run a MapReduce job over this table and select rows of several days as the
job's input data. How can I achieve this?

If I prefix the rowkey with the date, I can easily select one day's data as
the job's input, but this will involve hot spot problem because hundreds of
millions of rows will be added to this table each day and the data will
probably go to a single region server. Secondary index would be good for
query but not good for a batch processing job.

Are there any other ways?

Are there any other frameworks which can achieve this goal easieruser?
Shark? Stinger´╝čHSearch?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message