hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Einspanjer <deinspan...@mozilla.com>
Subject Re: hbase table as a queue.
Date Tue, 19 Jul 2011 15:26:47 GMT
We use a queue table like this too and ran into the same problem.  How 
did you configure it such that it never splits?


On 7/16/11 4:24 PM, Stack wrote:
> I learned friday that our fellas on the frontend are using an hbase
> table to do simple queuing.  They insert stuff to be processed by
> distributed processes and when processes are done with the work,
> they'll remove the processed element from the hbase table.   They are
> queuing, processing, and removing millions of items a day.  Elements
> were added on the end of the queue (FIFO).
> The issue to avoid was that over time, especially if a while between
> major compactions, the latency was going up.  Turns out, the table had
> been splitting when the queue backed.   Then a scan for new stuff to
> process had to first traverse regions that had nought in them (the key
> was time-based and the tail of the table had moved on past these first
> regions).  This traversal, especially if no major compaction so lots
> of deletes to process, was taking time to get to the first row.
> To fix, we rid the table of its empty regions and made it so the table
> would on longer split so only ever one region in it.  This should make
> it so we don't end up with empty regions to skip through before we get
> to the first element in the table (need the major compaction running
> on a somewhat regular basis to temper latencies).  Will report back to
> the list if we find otherwise.
> Do not use locks.  Doesn't scale.  Maybe update a cell when task is
> taken out for processing.  If too much time elapses since last update,
> maybe give it out again?
> St.Ack
> On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin<magnito@gmail.com>  wrote:
>> Hello, we are thinking about using Hbase table as a simple queue which
>> will dispatch the work for a mapreduce job, as well as real time
>> fetching of data to present to end user.  In simple terms, suppose you
>> had a data source table and a queue table.  The queue table has a
>> smaller set of Rows that point to Values which in turn point to
>> Perma-set table, which has large collection of Rows.  (so Queue{Row,
>> Value} ->  Perma-Set {Row, Value}).  Or Q-Value ->  P-Row.   Our Goal is
>> to look up which Rows to retrieve from the Perma-Set table by looking
>> through the Queue.  Once the lookup into the Queue is done, the Row
>> from the Queue must be deleted to avoid the same process of Perma-Set
>> lookup be done twice; We expect many concurrent lookups to happen, so
>> I assume the first thing we need to do is to have a client that does
>> the work is acquire a lock on the Queue Row, process the work, then
>> Remove the Queue Row.
>> Has anyone done something similar before?  Any gotchas we should be away of?
>> Thanks.
>> -Jack

View raw message