hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Marc Spaggiari <jean-m...@spaggiari.org>
Subject Re: question about threads count
Date Tue, 29 Apr 2014 01:44:11 GMT
Simply don't set your status to 0 when you write it first.

Absence mean not read.
1 mean read.
So there is no risk that someone try to set 0 and someone else try to set 1.

Will that be an option?


2014-04-28 21:23 GMT-04:00 Li Li <fancyerii@gmail.com>:

> I am using hbase to store information for a web spider.
> I have a table to save information of a webpage, the rowkey is url,
> and there are other columns such as status(int) and depth(int)
> in the beginning, the status is 0.  A worker thread will select urls
> whose status is 0 and do something with it and modify it to 1,...
> there are more than 1 urls link to a given url.
> e.g.  url1->url url2->url
> there are two times insertion of url. If I do not use checkAndPut,
> when thread 1 insert url and the worker thread do something with url
> and modify its status to 1. Then thread 2 again insert url and reset
> the status to 0, then the worker thread will do somthing again. That's
> not I want.
>
> On Tue, Apr 29, 2014 at 8:56 AM, Jean-Marc Spaggiari
> <jean-marc@spaggiari.org> wrote:
> > Why do you want to make sure the row is only inserted once? If you insert
> > the same raw twice the 2nd one will simple overwrite the first one and
> > HBase will take care of the versions.
> >
> > regarding the codes fragments, I don't think the autoflush is going to
> do a
> > big difference compared to the cost of the check & put...
> >
> >
> > 2014-04-28 20:50 GMT-04:00 Li Li <fancyerii@gmail.com>:
> >
> >> I must use checkAndPut to ensure a row is only inserted once.
> >> if I have 1000 checkAndPut,will setAutoFlush(false) useful?
> >> is there any performance difference of the following two code fragments?
> >> 1.
> >>     table.setAutoFlush(false);
> >>     for(int i=0;i<1000;i++){
> >>          Put put=...
> >>          table.checkAndPut(,....put);
> >>     }
> >> 2.
> >>     table.setAutoFlush(true);
> >>     for(int i=0;i<1000;i++){
> >>          Put put=...
> >>          table.checkAndPut(,....put);
> >>     }
> >>
> >> On Tue, Apr 29, 2014 at 8:36 AM, Jean-Marc Spaggiari
> >> <jean-marc@spaggiari.org> wrote:
> >> > It depends. Batch a list of puts/gets wll be way faster than
> checkAndPut,
> >> > but the result will not be the same... a batch of puts will not do any
> >> > check...
> >> >
> >> >
> >> > 2014-04-28 20:17 GMT-04:00 Li Li <fancyerii@gmail.com>:
> >> >
> >> >> but I have many checkAndPut operations.
> >> >> will use batch a better solution?
> >> >>
> >> >> On Mon, Apr 28, 2014 at 8:01 PM, Jean-Marc Spaggiari
> >> >> <jean-marc@spaggiari.org> wrote:
> >> >> > Hi Li Li,
> >> >> >
> >> >> > Yes, threads will impact the performances. If you send all you
> writes
> >> >> with
> >> >> > a single thread, a single HBase handler will take care of them,
> etc.
> >> >> HBase
> >> >> > does not provide a single handler for a single client connexion.
> It's
> >> >> able
> >> >> > to handle multiple threads and clients.
> >> >> >
> >> >> > However, it also all depends on the way you send your writes.
If
> you
> >> >> send a
> >> >> > single puts(<10000>) per seconds, if will not be better
to send 10
> 000
> >> >> > threads with a single put.
> >> >> >
> >> >> > I will recommend you to run some perf tests on your installation
to
> >> find
> >> >> a
> >> >> > good number for your configuration.
> >> >> >
> >> >> > JM
> >> >> >
> >> >> >
> >> >> > 2014-04-28 6:27 GMT-04:00 Li Li <fancyerii@gmail.com>:
> >> >> >
> >> >> >> hi all,
> >> >> >>    with the same read/write data, will threads count affect
> >> performance?
> >> >> >>    e.g. I have 10,000 write request/second. I don't care the
order
> >> very
> >> >> >> much.
> >> >> >>    how many writer threads should I use to obtain maximum
> throughput?
> >> >> >>
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message