hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: LeaseException while extracting data via pig/hbase integration
Date Tue, 14 Feb 2012 18:30:54 GMT
On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk <mikael.sitruk@gmail.com> wrote:
> hi,
> Well no, i can't figure out what is the problem, but i saw that someone
> else had the same problem (see email: "LeaseException despite high
> hbase.regionserver.lease.period")
> What can i tell is the following:
> Last week the problem was consistent
> 1. I updated hbase.regionserver.lease.period=300000 (5 mins), restarted the
> cluster and still got the problem, the map got this exception event before
> the 5 mins, (some after 1 min and 20 sec)

That's extremely suspicious. Are you sure the setting is getting picked up? :)

You should be able to tell when the lease really expires by simply
grepping for the number in the region server log, it should give you a
good idea of what your lease period is.

> 2. The problem occurs only on job that will extract a large number of
> columns (>150 cols per row)

What's your scanner caching set to? Are you spending a lot of time
processing each row?

> 3. The problem never occurred when only 1 map per server is running (i have
> 8 CPU with hyper-threaded enabled = 16, so using only 1 map per machine is
> just a waste), (at this stage I was thinking perhaps there is a
> multi-threaded problem)

More mappers would pull more data from the region servers so more
concurrency from the disks, using more mappers might just slow you
down enough that you hit the issue.

> This week i got a sightly different behavior, after having restarted the
> servers. The extract were able to ran ok in most of the runs even with 4
> maps running (per servers), i got only once the exception but the job was
> not killed as other runs last week

If the client got an UnknownScannerException before the timeout
expires (the client also keeps track of it, although it may have a
different configuration), it will recreate the scanner.

Which reminds me, are your regions moving around? If so, and your
clients don't know about the high timeout, then they might let the
exception pass on to your own code.


View raw message