Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of michael_segel@hotmail.com
 designates 65.55.111.97 as permitted sender)
Message-ID: <BLU0-SMTP323A0C563A52E84822AA00C8FF40@phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"
MIME-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
Subject: Re: coprocessor enabled put very slow, help please~~~
From: Michael Segel <michael_segel@hotmail.com>
In-Reply-To: 
 <CABALG=RLjavkb=t_ZUyewE-VE3NwsAYTU_+RoOxPrmLrXk+KDw@mail.gmail.com>
Date: Mon, 18 Feb 2013 06:11:06 -0600
Content-Transfer-Encoding: quoted-printable
References: <78023CDA-6661-4BE1-B5FF-3762119E51A2@gmail.com>
 <OF0BC16236.298EF72E-ON85257B16.00202D73-85257B16.00203D7F@us.ibm.com>
 <E739A4DD-75C8-4BE3-8928-CE4AF873C198@gmail.com>
 <BLU0-SMTP1926CD90C636BFCB79751F18FF40@phx.gbl>
 <CABALG=RLjavkb=t_ZUyewE-VE3NwsAYTU_+RoOxPrmLrXk+KDw@mail.gmail.com>
To: user@hbase.apache.org


The  issue I was talking about was the use of a check and put.=20
The OP wrote:
>>>> each map inserts to doc table.(checkAndPut)
>>>> regionobserver coprocessor does a postCheckAndPut and inserts some =
rows to
>>>> a index table.

My question is why does the OP use a checkAndPut, and the =
RegionObserver's postChecAndPut?


Here's a good example... =
http://stackoverflow.com/questions/13404447/is-hbase-checkandput-latency-h=
igher-than-simple-put

The OP doesn't really get in to the use case, so we don't know why the =
Check and Put in the M/R job.=20
He should just be using put() and then a postPut().=20

Another issue... since he's writing to  a different HTable... how? Does =
he create an HTable instance in the start() method of his RO object and =
then reference it later? Or does he create the instance of the HTable on =
the fly in each postCheckAndPut() ?=20
Without seeing his code, we don't know.=20

Note that this is synchronous set of writes. Your overall return from =
the M/R call to put will wait until the second row is inserted.=20

Interestingly enough, you may want to consider disabling the WAL on the =
write to the index.  You can always run a M/R job that rebuilds the =
index should something occur to the system where you might lose the =
data.  Indexes *ARE* expendable. ;-)=20

Does that explain it?=20

-Mike

On Feb 18, 2013, at 4:57 AM, yonghu <yongyong313@gmail.com> wrote:

> Hi, Michael
>=20
> I don't quite understand what do you mean by "round trip back to the
> client". In my understanding, as the RegionServer and TaskTracker can
> be the same node, MR don't have to pull data into client and then
> process.  And you also mention the "unnecessary overhead", can you
> explain a little bit what operations or data processing can be seen as
> "unnecessary overhead".
>=20
> Thanks
>=20
> yong
> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
> <michael_segel@hotmail.com> wrote:
>> Why?
>>=20
>> This seems like an unnecessary overhead.
>>=20
>> You are writing code within the coprocessor on the server.  =
Pessimistic code really isn't recommended if you are worried about =
performance.
>>=20
>> I have to ask... by the time you have executed the code in your =
co-processor, what would cause the initial write to fail?
>>=20
>>=20
>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel <prakash.kadel@gmail.com> =
wrote:
>>=20
>>> its a local read. i just check the last param of PostCheckAndPut =
indicating if the Put succeeded. Incase if the put success, i insert a =
row in another table
>>>=20
>>> Sincerely,
>>> Prakash Kadel
>>>=20
>>> On Feb 18, 2013, at 2:52 PM, Wei Tan <wtan@us.ibm.com> wrote:
>>>=20
>>>> Is your CheckAndPut involving a local or remote READ? Due to the =
nature of
>>>> LSM, read is much slower compared to a write...
>>>>=20
>>>>=20
>>>> Best Regards,
>>>> Wei
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>> From:   Prakash Kadel <prakash.kadel@gmail.com>
>>>> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
>>>> Date:   02/17/2013 07:49 PM
>>>> Subject:        coprocessor enabled put very slow, help please~~~
>>>>=20
>>>>=20
>>>>=20
>>>> hi,
>>>> i am trying to insert few million documents to hbase with =
mapreduce. To
>>>> enable quick search of docs i want to have some indexes, so i tried =
to use
>>>> the coprocessors, but they are slowing down my inserts. Arent the
>>>> coprocessors not supposed to increase the latency?
>>>> my settings:
>>>>  3 region servers
>>>> 60 maps
>>>> each map inserts to doc table.(checkAndPut)
>>>> regionobserver coprocessor does a postCheckAndPut and inserts some =
rows to
>>>> a index table.
>>>>=20
>>>>=20
>>>> Sincerely,
>>>> Prakash
>>>>=20
>>>=20
>>=20
>> Michael Segel  | (m) 312.755.9623
>>=20
>> Segel and Associates
>>=20
>>=20
>=20