hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: talk list table
Date Sat, 20 Apr 2013 23:10:36 GMT
+ http://blog.sematext.com/2012/12/24/hbasewd-and-hbasehut-handy-hbase-libraries-available-in-public-maven-repo/
if you use Maven and want to use HBaseWD.

Otis
--
HBASE Performance Monitoring - http://sematext.com/spm/index.html




On Sat, Apr 20, 2013 at 11:24 AM, Amit Sela <amits@infolinks.com> wrote:
> Hope I'm not too late here... regarding hot spotting with sequential keys,
> I'd suggest you read this Sematext blog -
> http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/
> They present a nice idea there for this kind of issues.
>
> Good Luck!
>
>
>
> On Mon, Apr 15, 2013 at 11:18 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> bq. write performance would be lower
>>
>> The above means poorer performance.
>>
>> bq. I could batch them up application side
>>
>> Please do that.
>>
>> bq. I guess there is no way to turn that off?
>>
>> That's right.
>>
>> On Mon, Apr 15, 2013 at 11:15 AM, Kireet <kireet@feedly.com> wrote:
>>
>> >
>> >
>> >
>> > Thanks for the reply. "write performance would be lower" -> this means
>> > better?
>> >
>> > Also I think I used the wrong terminology regarding batching. I meant to
>> > ask if it uses the client side write buffer. I would think not since the
>> > append() method returns a Result. I could batch them up application side
>> I
>> > suppose. Append also seems to return the updated value. This seems like a
>> > lot of unnecessary I/O in my case since I am not immediately interested
>> in
>> > the updated value. I guess there is no way to turn that off?
>> >
>> >
>> > On 4/15/13 1:28 PM, Ted Yu wrote:
>> >
>> >> I assume you would select HBase 0.94.6.1 (the latest release) for this
>> >> project.
>> >>
>> >> For #1, write performance would be lower if you choose to use Append
>> (vs.
>> >> using Put).
>> >>
>> >> bq. Can appends be batched by the client or do they execute immediately?
>> >> This depends on your use case. Take a look at the following method in
>> >> HTable where you can send a list of actions (Appends):
>> >>
>> >>    public void batch(final List<?extends Row> actions, final Object[]
>> >> results)
>> >> For #2
>> >> bq. The other would be to prefix the timestamp row key with a random
>> >> leading byte.
>> >>
>> >> This technique has been used elsewhere and is better than the first one.
>> >>
>> >> Cheers
>> >>
>> >> On Mon, Apr 15, 2013 at 6:09 AM, Kireet Reddy
>> <kireet-Teh5dPVPL8nQT0dZR+*
>> >> *AlfA@public.gmane.org <
>> kireet-Teh5dPVPL8nQT0dZR%2BAlfA@public.gmane.org>>
>> >> wrote:
>> >>
>> >>  I are planning to create a "scheduled task list" table in our hbase
>> >>> cluster. Essentially we will define a table with key timestamp and then
>> >>> the
>> >>> row contents will be all the tasks that need to be processed within
>> that
>> >>> second (or whatever time period). I am trying to do the "reasonably
>> wide
>> >>> rows" design mentioned in the hbasecon opentsdb talk. A couple of
>> >>> questions:
>> >>>
>> >>> 1. Should we use append or put to create tasks? Since these rows will
>> not
>> >>> live forever, storage space in not a concern, read/write performance
is
>> >>> more important. As concurrency increases I would guess the row lock
may
>> >>> become an issue in append? Can appends be batched by the client or do
>> >>> they
>> >>> execute immediately?
>> >>>
>> >>> 2. I am a little worried about hotspots. This basic design may cause
>> >>> issues in terms of the table's performance. Many tasks will execute
and
>> >>> reschedule themselves using the same interval, t + 1 hour for example.
>> So
>> >>> many the writes may all go to the same block.  Also, we have a lot of
>> >>> other
>> >>> data so I am worried it may impact performance of unrelated data if
the
>> >>> region server gets too busy servicing the task list table. I can think
>> >>> of 2
>> >>> strategies to avoid this. One would be to create N different tables
and
>> >>> read/write tasks to them randomly. This may spread load across servers,
>> >>> but
>> >>> there is no guarantee hbase will place the tables on different region
>> >>> servers, correct? The other would be to prefix the timestamp row key
>> >>> with a
>> >>> random leading byte. Then when reading from the task list table,
>> >>> consumers
>> >>> could scan from any/all possible values of the random byte + current
>> >>> timestamp to obtain tasks. Both strategies seem like they could spread
>> >>> out
>> >>> load, but at the cost of more work/complexity to read tasks from the
>> >>> table.
>> >>> Do either of those approaches make sense?
>> >>>
>> >>> On the read side, it seems like a similar problem exists in that all
>> >>> consumers will be reading rows based on the current timestamp. Is this
>> >>> good
>> >>> because the block will very likely be cached or bad because the region
>> >>> server may become overloaded? I have a feeling the answer is going to
>> be
>> >>> "it depends". :)
>> >>>
>> >>> I did see the previous posts on queues and the tips there - use
>> zookeeper
>> >>> for coordination, schedule major compactions, etc. Sorry if these
>> >>> questions
>> >>> are basic, I am pretty new to hbase. Thanks!
>> >>>
>> >>
>> >>
>> >
>> >
>>

Mime
View raw message