hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wei Tan <w...@us.ibm.com>
Subject Re: coprocessor enabled put very slow, help please~~~
Date Tue, 19 Feb 2013 15:15:00 GMT
A side question: if HTablePool is not encouraged to be used... how we 
handle the thread safeness in using HTable? Any replacement for 
HTablePool, in plan?
Thanks,


Best Regards,
Wei




From:   Michel Segel <michael_segel@hotmail.com>
To:     "user@hbase.apache.org" <user@hbase.apache.org>, 
Date:   02/18/2013 09:23 AM
Subject:        Re: coprocessor enabled put very slow, help please~~~



Why are you using an HTable Pool?
Why are you closing the table after each iteration through?

Try using 1 HTable object. Turn off WAL
Initiate in start()
Close in Stop()
Surround the use in a try / catch
If exception caught, re instantiate new HTable connection.

Maybe want to flush the connection after puts. 


Again not sure why you are using check and put on the base table. Your 
count could be off.

As an example look at poem/rhyme 'Marry had a little lamb'.
Then check your word count.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Feb 18, 2013, at 7:21 AM, prakash kadel <prakash.kadel@gmail.com> 
wrote:

> Thank you guys for your replies,
> Michael,
>   I think i didnt make it clear. Here is my use case,
> 
> I have text documents to insert in the hbase. (With possible duplicates)
> Suppose i have a document as : " I am working. He is not working"
> 
> I want to insert this document to a table in hbase, say table "doc"
> 
> =doc table=
> -----
> rowKey : doc_id
> cf: doc_content
> value: "I am working. He is not working"
> 
> Now, i to create another table that stores the word count, say "doc_idx"
> 
> doc_idx table
> ---
> rowKey : I, cf: count, value: 1
> rowKey : am, cf: count, value: 1
> rowKey : working, cf: count, value: 2
> rowKey : He, cf: count, value: 1
> rowKey : is, cf: count, value: 1
> rowKey : not, cf: count, value: 1
> 
> My MR job code:
> ==============
> 
> if(doc.checkAndPut(rowKey, doc_content, "", null, putDoc)) {
>    for(String word : doc_content.split("\\s+")) {
>       Increment inc = new Increment(Bytes.toBytes(word));
>       inc.addColumn("count", "", 1);
>    }
> }
> 
> Now, i wanted to do some experiments with coprocessors. So, i modified
> the code as follows.
> 
> My MR job code:
> ===============
> 
> doc.checkAndPut(rowKey, doc_content, "", null, putDoc);
> 
> Coprocessor code:
> ===============
> 
>    public void start(CoprocessorEnvironment env)  {
>        pool = new HTablePool(conf, 100);
>    }
> 
>    public boolean postCheckAndPut(c,  row,  family, byte[] qualifier,
> compareOp,     comparator,  put, result) {
> 
>                if(!result) return true; // check if the put succeeded
> 
>        HTableInterface table_idx = pool.getTable("doc_idx");
> 
>        try {
> 
>            for(KeyValue contentKV = put.get("doc_content", "")) {
>                            for(String word :
> contentKV.getValue().split("\\s+")) {
>                                Increment inc = new
> Increment(Bytes.toBytes(word));
>                                inc.addColumn("count", "", 1);
>                                table_idx.increment(inc);
>                            }
>                       }
>        } finally {
>            table_idx.close();
>        }
>        return true;
>    }
> 
>    public void stop(env) {
>        pool.close();
>    }
> 
> I am a newbee to HBASE. I am not sure this is the way to do.
> Given that, why is the cooprocessor enabled version much slower than
> the one without?
> 
> 
> Sincerely,
> Prakash Kadel
> 
> 
> On Mon, Feb 18, 2013 at 9:11 PM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> 
>> The  issue I was talking about was the use of a check and put.
>> The OP wrote:
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some

rows to
>>>>>> a index table.
>> 
>> My question is why does the OP use a checkAndPut, and the 
RegionObserver's postChecAndPut?
>> 
>> 
>> Here's a good example... 
http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put

>> 
>> The OP doesn't really get in to the use case, so we don't know why the 
Check and Put in the M/R job.
>> He should just be using put() and then a postPut().
>> 
>> Another issue... since he's writing to  a different HTable... how? Does 
he create an HTable instance in the start() method of his RO object and 
then reference it later? Or does he create the instance of the HTable on 
the fly in each postCheckAndPut() ?
>> Without seeing his code, we don't know.
>> 
>> Note that this is synchronous set of writes. Your overall return from 
the M/R call to put will wait until the second row is inserted.
>> 
>> Interestingly enough, you may want to consider disabling the WAL on the 
write to the index.  You can always run a M/R job that rebuilds the index 
should something occur to the system where you might lose the data. 
Indexes *ARE* expendable. ;-)
>> 
>> Does that explain it?
>> 
>> -Mike
>> 
>> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong313@gmail.com> wrote:
>> 
>>> Hi, Michael
>>> 
>>> I don't quite understand what do you mean by "round trip back to the
>>> client". In my understanding, as the RegionServer and TaskTracker can
>>> be the same node, MR don't have to pull data into client and then
>>> process.  And you also mention the "unnecessary overhead", can you
>>> explain a little bit what operations or data processing can be seen as
>>> "unnecessary overhead".
>>> 
>>> Thanks
>>> 
>>> yong
>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>>> <michael_segel@hotmail.com> wrote:
>>>> Why?
>>>> 
>>>> This seems like an unnecessary overhead.
>>>> 
>>>> You are writing code within the coprocessor on the server. 
Pessimistic code really isn't recommended if you are worried about 
performance.
>>>> 
>>>> I have to ask... by the time you have executed the code in your 
co-processor, what would cause the initial write to fail?
>>>> 
>>>> 
>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.kadel@gmail.com>

wrote:
>>>> 
>>>>> its a local read. i just check the last param of PostCheckAndPut 
indicating if the Put succeeded. Incase if the put success, i insert a row 
in another table
>>>>> 
>>>>> Sincerely,
>>>>> Prakash Kadel
>>>>> 
>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <wtan@us.ibm.com> wrote:
>>>>> 
>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the

nature of
>>>>>> LSM, read is much slower compared to a write...
>>>>>> 
>>>>>> 
>>>>>> Best Regards,
>>>>>> Wei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:   Prakash Kadel <prakash.kadel@gmail.com>
>>>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
>>>>>> Date:   02/17/2013 07:49 PM
>>>>>> Subject:        coprocessor enabled put very slow, help please~~~
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> hi,
>>>>>> i am trying to insert few million documents to hbase with 
mapreduce. To
>>>>>> enable quick search of docs i want to have some indexes, so i tried

to use
>>>>>> the coprocessors, but they are slowing down my inserts. Arent the
>>>>>> coprocessors not supposed to increase the latency?
>>>>>> my settings:
>>>>>> 3 region servers
>>>>>> 60 maps
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some

rows to
>>>>>> a index table.
>>>>>> 
>>>>>> 
>>>>>> Sincerely,
>>>>>> Prakash
>>>> 
>>>> Michael Segel  | (m) 312.755.9623
>>>> 
>>>> Segel and Associates
> 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message