hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kireet <kir...@feedly.com>
Subject Re: talk list table
Date Mon, 15 Apr 2013 18:15:41 GMT



Thanks for the reply. "write performance would be lower" -> this means 
better?

Also I think I used the wrong terminology regarding batching. I meant to 
ask if it uses the client side write buffer. I would think not since the 
append() method returns a Result. I could batch them up application side 
I suppose. Append also seems to return the updated value. This seems 
like a lot of unnecessary I/O in my case since I am not immediately 
interested in the updated value. I guess there is no way to turn that off?

On 4/15/13 1:28 PM, Ted Yu wrote:
> I assume you would select HBase 0.94.6.1 (the latest release) for this
> project.
>
> For #1, write performance would be lower if you choose to use Append (vs.
> using Put).
>
> bq. Can appends be batched by the client or do they execute immediately?
> This depends on your use case. Take a look at the following method in
> HTable where you can send a list of actions (Appends):
>
>    public void batch(final List<?extends Row> actions, final Object[]
> results)
> For #2
> bq. The other would be to prefix the timestamp row key with a random
> leading byte.
>
> This technique has been used elsewhere and is better than the first one.
>
> Cheers
>
> On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy <kireet-Teh5dPVPL8nQT0dZR+AlfA@public.gmane.org>
wrote:
>
>> I are planning to create a "scheduled task list" table in our hbase
>> cluster. Essentially we will define a table with key timestamp and then the
>> row contents will be all the tasks that need to be processed within that
>> second (or whatever time period). I am trying to do the "reasonably wide
>> rows" design mentioned in the hbasecon opentsdb talk. A couple of questions:
>>
>> 1. Should we use append or put to create tasks? Since these rows will not
>> live forever, storage space in not a concern, read/write performance is
>> more important. As concurrency increases I would guess the row lock may
>> become an issue in append? Can appends be batched by the client or do they
>> execute immediately?
>>
>> 2. I am a little worried about hotspots. This basic design may cause
>> issues in terms of the table's performance. Many tasks will execute and
>> reschedule themselves using the same interval, t + 1 hour for example. So
>> many the writes may all go to the same block.  Also, we have a lot of other
>> data so I am worried it may impact performance of unrelated data if the
>> region server gets too busy servicing the task list table. I can think of 2
>> strategies to avoid this. One would be to create N different tables and
>> read/write tasks to them randomly. This may spread load across servers, but
>> there is no guarantee hbase will place the tables on different region
>> servers, correct? The other would be to prefix the timestamp row key with a
>> random leading byte. Then when reading from the task list table, consumers
>> could scan from any/all possible values of the random byte + current
>> timestamp to obtain tasks. Both strategies seem like they could spread out
>> load, but at the cost of more work/complexity to read tasks from the table.
>> Do either of those approaches make sense?
>>
>> On the read side, it seems like a similar problem exists in that all
>> consumers will be reading rows based on the current timestamp. Is this good
>> because the block will very likely be cached or bad because the region
>> server may become overloaded? I have a feeling the answer is going to be
>> "it depends". :)
>>
>> I did see the previous posts on queues and the tips there - use zookeeper
>> for coordination, schedule major compactions, etc. Sorry if these questions
>> are basic, I am pretty new to hbase. Thanks!
>



Mime
View raw message