hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sleiman Jneidi <jneidi.slei...@gmail.com>
Subject Re: Streaming data to htable
Date Fri, 13 Feb 2015 09:39:38 GMT
I would go with second option, HtableInterface.put(List<Put>). The first
option sounds dodgy, where 5 minutes is a good time for things to go wrong
and you lose your data

On Fri, Feb 13, 2015 at 6:20 AM, hongbin ma <mahongbin@apache.org> wrote:

> hi,
> I'm trying to use a htable to store data that comes in a streaming fashion.
> The streaming in data is guaranteed to have a larger KEY than ANY existing
> keys in the table.
> And the data will be READONLY.
> The data is streaming in at a very high rate, I don't want to issue a PUT
> operation for each data entry, because obviously it is poor in performance.
> I'm thinking about pooling the data entries and flush them to hbase every
> five minutes, and I AFAIK there're few options:
> 1.  Pool the data entries, and every 5 minute run a MR job to convert the
> data to hfile format. This approach could avoid the overhead of single PUT,
> but I'm afraid the MR job might be too costly( waiting in the job queue) to
> keep in pace.
> 2. Use HtableInterface.put(List<Put>) the batched version should be faster,
> but I'm not quite sure how much.
> 3.?
> can anyone give me some advice on this?
> thanks!
> hongbin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message