hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: hbase table as a queue.
Date Sat, 16 Jul 2011 20:24:56 GMT
I learned friday that our fellas on the frontend are using an hbase
table to do simple queuing.  They insert stuff to be processed by
distributed processes and when processes are done with the work,
they'll remove the processed element from the hbase table.   They are
queuing, processing, and removing millions of items a day.  Elements
were added on the end of the queue (FIFO).

The issue to avoid was that over time, especially if a while between
major compactions, the latency was going up.  Turns out, the table had
been splitting when the queue backed.   Then a scan for new stuff to
process had to first traverse regions that had nought in them (the key
was time-based and the tail of the table had moved on past these first
regions).  This traversal, especially if no major compaction so lots
of deletes to process, was taking time to get to the first row.

To fix, we rid the table of its empty regions and made it so the table
would on longer split so only ever one region in it.  This should make
it so we don't end up with empty regions to skip through before we get
to the first element in the table (need the major compaction running
on a somewhat regular basis to temper latencies).  Will report back to
the list if we find otherwise.

Do not use locks.  Doesn't scale.  Maybe update a cell when task is
taken out for processing.  If too much time elapses since last update,
maybe give it out again?

St.Ack

On Sat, Jul 16, 2011 at 9:38 AM, Jack Levin <magnito@gmail.com> wrote:
> Hello, we are thinking about using Hbase table as a simple queue which
> will dispatch the work for a mapreduce job, as well as real time
> fetching of data to present to end user.  In simple terms, suppose you
> had a data source table and a queue table.  The queue table has a
> smaller set of Rows that point to Values which in turn point to
> Perma-set table, which has large collection of Rows.  (so Queue{Row,
> Value} -> Perma-Set {Row, Value}).  Or Q-Value -> P-Row.   Our Goal is
> to look up which Rows to retrieve from the Perma-Set table by looking
> through the Queue.  Once the lookup into the Queue is done, the Row
> from the Queue must be deleted to avoid the same process of Perma-Set
> lookup be done twice; We expect many concurrent lookups to happen, so
> I assume the first thing we need to do is to have a client that does
> the work is acquire a lock on the Queue Row, process the work, then
> Remove the Queue Row.
>
> Has anyone done something similar before?  Any gotchas we should be away of?
>
> Thanks.
>
> -Jack
>

Mime
View raw message