hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: coprocessor enabled put very slow, help please~~~
Date Mon, 18 Feb 2013 12:45:52 GMT
Well it also goes back to the question of how the RO is writing to the second table. 

I would imagine that if the M/R uses Mapper.setup() to instantiate the HTable for the index
write  and then in Mapper.map() writes to the index table, why would the co-processor take
much more time?

I think a code review would be in order.


On Feb 18, 2013, at 6:22 AM, yonghu <yongyong313@gmail.com> wrote:

> Ok. Now, I got your point. I didn't notice the "checkAndPut".
> 
> regards!
> 
> Yong
> 
> On Mon, Feb 18, 2013 at 1:11 PM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> 
>> The  issue I was talking about was the use of a check and put.
>> The OP wrote:
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some
rows to
>>>>>> a index table.
>> 
>> My question is why does the OP use a checkAndPut, and the RegionObserver's postChecAndPut?
>> 
>> 
>> Here's a good example... http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-higher-than-simple-put
>> 
>> The OP doesn't really get in to the use case, so we don't know why the Check and
Put in the M/R job.
>> He should just be using put() and then a postPut().
>> 
>> Another issue... since he's writing to  a different HTable... how? Does he create
an HTable instance in the start() method of his RO object and then reference it later? Or
does he create the instance of the HTable on the fly in each postCheckAndPut() ?
>> Without seeing his code, we don't know.
>> 
>> Note that this is synchronous set of writes. Your overall return from the M/R call
to put will wait until the second row is inserted.
>> 
>> Interestingly enough, you may want to consider disabling the WAL on the write to
the index.  You can always run a M/R job that rebuilds the index should something occur to
the system where you might lose the data.  Indexes *ARE* expendable. ;-)
>> 
>> Does that explain it?
>> 
>> -Mike
>> 
>> On Feb 18, 2013, at 4:57 AM, yonghu <yongyong313@gmail.com> wrote:
>> 
>>> Hi, Michael
>>> 
>>> I don't quite understand what do you mean by "round trip back to the
>>> client". In my understanding, as the RegionServer and TaskTracker can
>>> be the same node, MR don't have to pull data into client and then
>>> process.  And you also mention the "unnecessary overhead", can you
>>> explain a little bit what operations or data processing can be seen as
>>> "unnecessary overhead".
>>> 
>>> Thanks
>>> 
>>> yong
>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>>> <michael_segel@hotmail.com> wrote:
>>>> Why?
>>>> 
>>>> This seems like an unnecessary overhead.
>>>> 
>>>> You are writing code within the coprocessor on the server.  Pessimistic code
really isn't recommended if you are worried about performance.
>>>> 
>>>> I have to ask... by the time you have executed the code in your co-processor,
what would cause the initial write to fail?
>>>> 
>>>> 
>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.kadel@gmail.com>
wrote:
>>>> 
>>>>> its a local read. i just check the last param of PostCheckAndPut indicating
if the Put succeeded. Incase if the put success, i insert a row in another table
>>>>> 
>>>>> Sincerely,
>>>>> Prakash Kadel
>>>>> 
>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <wtan@us.ibm.com> wrote:
>>>>> 
>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the
nature of
>>>>>> LSM, read is much slower compared to a write...
>>>>>> 
>>>>>> 
>>>>>> Best Regards,
>>>>>> Wei
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> From:   Prakash Kadel <prakash.kadel@gmail.com>
>>>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
>>>>>> Date:   02/17/2013 07:49 PM
>>>>>> Subject:        coprocessor enabled put very slow, help please~~~
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> hi,
>>>>>> i am trying to insert few million documents to hbase with mapreduce.
To
>>>>>> enable quick search of docs i want to have some indexes, so i tried
to use
>>>>>> the coprocessors, but they are slowing down my inserts. Arent the
>>>>>> coprocessors not supposed to increase the latency?
>>>>>> my settings:
>>>>>> 3 region servers
>>>>>> 60 maps
>>>>>> each map inserts to doc table.(checkAndPut)
>>>>>> regionobserver coprocessor does a postCheckAndPut and inserts some
rows to
>>>>>> a index table.
>>>>>> 
>>>>>> 
>>>>>> Sincerely,
>>>>>> Prakash
>>>>>> 
>>>>> 
>>>> 
>>>> Michael Segel  | (m) 312.755.9623
>>>> 
>>>> Segel and Associates
>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message