hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: question about threads count
Date Tue, 29 Apr 2014 01:23:26 GMT
I am using hbase to store information for a web spider.
I have a table to save information of a webpage, the rowkey is url,
and there are other columns such as status(int) and depth(int)
in the beginning, the status is 0.  A worker thread will select urls
whose status is 0 and do something with it and modify it to 1,...
there are more than 1 urls link to a given url.
e.g.  url1->url url2->url
there are two times insertion of url. If I do not use checkAndPut,
when thread 1 insert url and the worker thread do something with url
and modify its status to 1. Then thread 2 again insert url and reset
the status to 0, then the worker thread will do somthing again. That's
not I want.

On Tue, Apr 29, 2014 at 8:56 AM, Jean-Marc Spaggiari
<jean-marc@spaggiari.org> wrote:
> Why do you want to make sure the row is only inserted once? If you insert
> the same raw twice the 2nd one will simple overwrite the first one and
> HBase will take care of the versions.
>
> regarding the codes fragments, I don't think the autoflush is going to do a
> big difference compared to the cost of the check & put...
>
>
> 2014-04-28 20:50 GMT-04:00 Li Li <fancyerii@gmail.com>:
>
>> I must use checkAndPut to ensure a row is only inserted once.
>> if I have 1000 checkAndPut,will setAutoFlush(false) useful?
>> is there any performance difference of the following two code fragments?
>> 1.
>>     table.setAutoFlush(false);
>>     for(int i=0;i<1000;i++){
>>          Put put=...
>>          table.checkAndPut(,....put);
>>     }
>> 2.
>>     table.setAutoFlush(true);
>>     for(int i=0;i<1000;i++){
>>          Put put=...
>>          table.checkAndPut(,....put);
>>     }
>>
>> On Tue, Apr 29, 2014 at 8:36 AM, Jean-Marc Spaggiari
>> <jean-marc@spaggiari.org> wrote:
>> > It depends. Batch a list of puts/gets wll be way faster than checkAndPut,
>> > but the result will not be the same... a batch of puts will not do any
>> > check...
>> >
>> >
>> > 2014-04-28 20:17 GMT-04:00 Li Li <fancyerii@gmail.com>:
>> >
>> >> but I have many checkAndPut operations.
>> >> will use batch a better solution?
>> >>
>> >> On Mon, Apr 28, 2014 at 8:01 PM, Jean-Marc Spaggiari
>> >> <jean-marc@spaggiari.org> wrote:
>> >> > Hi Li Li,
>> >> >
>> >> > Yes, threads will impact the performances. If you send all you writes
>> >> with
>> >> > a single thread, a single HBase handler will take care of them, etc.
>> >> HBase
>> >> > does not provide a single handler for a single client connexion. It's
>> >> able
>> >> > to handle multiple threads and clients.
>> >> >
>> >> > However, it also all depends on the way you send your writes. If you
>> >> send a
>> >> > single puts(<10000>) per seconds, if will not be better to send
10 000
>> >> > threads with a single put.
>> >> >
>> >> > I will recommend you to run some perf tests on your installation to
>> find
>> >> a
>> >> > good number for your configuration.
>> >> >
>> >> > JM
>> >> >
>> >> >
>> >> > 2014-04-28 6:27 GMT-04:00 Li Li <fancyerii@gmail.com>:
>> >> >
>> >> >> hi all,
>> >> >>    with the same read/write data, will threads count affect
>> performance?
>> >> >>    e.g. I have 10,000 write request/second. I don't care the order
>> very
>> >> >> much.
>> >> >>    how many writer threads should I use to obtain maximum throughput?
>> >> >>
>> >>
>>

Mime
View raw message