hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Lease does not exist exceptions
Date Tue, 18 Oct 2011 21:57:45 GMT
Actually the important setting is:

http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setCaching(int)

The decides how many rows are fetched each time the client exhausts its
local cache and goes back to the server. Reasons to have setCaching low:

 - Do you have a filter on? If so it could spend some time in the region
server trying to find all the rows
 - Are your rows fat? It might put a lot of memory pressure in the region
server
 - Are you spending a lot of time on each row, like Stack was saying? This
could also be a side effect of inserting back into HBase. The issue I hit
recently was that I was inserting a massive table into a tiny one (in terms
of # of regions), and I was hitting the 90 seconds sleep because of too many
store files. Right there waiting that time was getting over the 60 seconds
lease timeout.

Reasons to have setCaching high:

 - Lots of tiny-ish rows that you process really really fast. Basically if
your bottleneck is just getting the rows from HBase.

I found that 1000 is a good number for our rows when we process them fast,
but that 10 is just as good if we need to spend time on each row. YMMV.

With all that said, I don't know if your caching is set to anything else
than the default of 1, so this whole discussion could be a waste.


Anyways, here's what I do see in your case. LeaseException is a rare one,
usually you get UnknownScannerException (could it be that you have it too?
 Do you have a log?). Looking at HRS.next, I see that the only way to get
this is if you race with the ScannerListener. The method does this:

InternalScanner s = this.scanners.get(scannerName);
...
if (s == null) throw new UnknownScannerException("Name: " + scannerName);
...
lease = this.leases.removeLease(scannerName);

And when a scan expires (the lease was just removed from this.leases):

LOG.info("Scanner " + this.scannerName + " lease expired");
InternalScanner s = scanners.remove(this.scannerName);

Which means that your exception happens after you get the InternalScanner in
next(), and before you get to this.leases.removeLease the lease expiration
already started. If you get this all the time, there might be a bigger issue
or else I would expect that you see UnknownScannerException. It could be due
to locking contention, I see that there's a synchronized in removeLease in
the leases queue, but it seems unlikely since what happens in those sync
blocks is fast.

If you do get some UnknownScannerExceptions, they will show how long you
took before going back to the server by say like 65340ms ms passed since the
last invocation, timeout is currently set to 60000 (where 65340 is a number
I just invented, yours will be different). After that you need to find where
you are spending that time.

J-D

On Tue, Oct 18, 2011 at 6:39 AM, Eran Kutner <eran@gigya.com> wrote:

> Hi Stack,
> Yep, reducing the number of map tasks did resolve the problem, however the
> only way I found for doing it is by changing the setting in the
> mapred-site.xml file, which means it will affect all my jobs. Do you know
> if
> there is a way to limit the number of concurrent map tasks a specific job
> may run? I know it was possible with the old JobConf class from the mapred
> namespace but the new Job class doesn't have the setNumMapTasks() method.
> Is it possible to extend the lease timeout? I'm not even sure lease on
> what,
> HDFS blocks? What is it by default?
>
> As for setBatch, what would be a good value? I didn't set it before and
> setting it didn't seem to change anything.
>
> Finally to answer your question regarding the intensity of the job - yes,
> it
> is pretty intense, getting cpu and disk IO utilization to ~90%
>
> Thanks a million!
>
> -eran
>
>
>
> On Tue, Oct 18, 2011 at 13:06, Stack <stack@duboce.net> wrote:
>
> > Look back in the mailing list Eran for more detailed answers but in
> > essence, the below usually means that the client has been away from
> > the server too long.  This can happen for a few reasons.  If you fetch
> > lots of rows per next on a scanner, processing the batch client side
> > may be taking you longer than the lease timeout.  Set down the
> > prefetch size and see if that helps (I'm talking about this:
> >
> >
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html#setBatch(int)
> > ).
> >  Throw in a GC on client-side or over on the server-side and it might
> > put you over your lease timeout.  Are your mapreduce jobs heavy-duty
> > robbing resources from the running regionservers or datanodes?  Try
> > having them run half the mappers and see if that makes it more likely
> > your job will complete.
> >
> > St.Ack
> > P.S IIRC, J-D tripped over a cause recently but I can't find it at the
> mo.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message