hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: LeaseException while extracting data via pig/hbase integration
Date Wed, 15 Feb 2012 18:17:45 GMT
You would have to grep the lease's id, in your first email it was
"-7220618182832784549".

About the time it takes to process each row, I meant client (pig) side
not in the RS.

J-D

On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk <mikael.sitruk@gmail.com> wrote:
> Please see answer inline
> Thanks
> Mikael.S
>
> On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk <mikael.sitruk@gmail.com>
>> wrote:
>> > hi,
>> > Well no, i can't figure out what is the problem, but i saw that someone
>> > else had the same problem (see email: "LeaseException despite high
>> > hbase.regionserver.lease.period")
>> > What can i tell is the following:
>> > Last week the problem was consistent
>> > 1. I updated hbase.regionserver.lease.period=300000 (5 mins), restarted
>> the
>> > cluster and still got the problem, the map got this exception event
>> before
>> > the 5 mins, (some after 1 min and 20 sec)
>>
>> That's extremely suspicious. Are you sure the setting is getting picked
>> up? :) I hope so :-)
>>
>> You should be able to tell when the lease really expires by simply
>> grepping for the number in the region server log, it should give you a
>> good idea of what your lease period is.
>>  greeping on which value? the lease configured here:300000? It does not
>> return anything, also tried in current execution where some were ok and
>> some were not
>>
>> 2. The problem occurs only on job that will extract a large number of
>> > columns (>150 cols per row)
>>
>> What's your scanner caching set to? Are you spending a lot of time
>> processing each row? from the job configuration generated by pig i can see
>> caching set to 1, regarding the processing time of each row i have no clue
>> how many time it spent. the data for each row is 150 columns of 2k each.
>> This is approx 5 block to bring.
>>
>> > 3. The problem never occurred when only 1 map per server is running (i
>> have
>> > 8 CPU with hyper-threaded enabled = 16, so using only 1 map per machine
>> is
>> > just a waste), (at this stage I was thinking perhaps there is a
>> > multi-threaded problem)
>>
>> More mappers would pull more data from the region servers so more
>> concurrency from the disks, using more mappers might just slow you
>> down enough that you hit the issue.
>>
> Today i ran with 8 mappers and some failed and some didn't (2 of 4), they
> got the lease exception after 5 mins, i will try to check the
> logs/sar/metric files for additional info
>
>>
>> >
>> > This week i got a sightly different behavior, after having restarted the
>> > servers. The extract were able to ran ok in most of the runs even with 4
>> > maps running (per servers), i got only once the exception but the job was
>> > not killed as other runs last week
>>
>> If the client got an UnknownScannerException before the timeout
>> expires (the client also keeps track of it, although it may have a
>> different configuration), it will recreate the scanner.
>>
> No this is not the case.
>
>>
>> Which reminds me, are your regions moving around? If so, and your
>> clients don't know about the high timeout, then they might let the
>> exception pass on to your own code.
>>
> Region are presplited ahead, i do not have any region split during the run,
> region size is set of 8GB, storefile is around 3.5G
>
> The test was run after major compaction, so the number of store file is 1
> per RS/family
>
>
>>
>> J-D
>>
>
>
>
> --
> Mikael.S

Mime
View raw message