hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hongbin ma <mahong...@apache.org>
Subject Streaming data to htable
Date Fri, 13 Feb 2015 06:20:31 GMT

I'm trying to use a htable to store data that comes in a streaming fashion.
The streaming in data is guaranteed to have a larger KEY than ANY existing
keys in the table.
And the data will be READONLY.

The data is streaming in at a very high rate, I don't want to issue a PUT
operation for each data entry, because obviously it is poor in performance.
I'm thinking about pooling the data entries and flush them to hbase every
five minutes, and I AFAIK there're few options:

1.  Pool the data entries, and every 5 minute run a MR job to convert the
data to hfile format. This approach could avoid the overhead of single PUT,
but I'm afraid the MR job might be too costly( waiting in the job queue) to
keep in pace.

2. Use HtableInterface.put(List<Put>) the batched version should be faster,
but I'm not quite sure how much.


can anyone give me some advice on this?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message