hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikael Sitruk <mikael.sit...@gmail.com>
Subject Re: LeaseException while extracting data via pig/hbase integration
Date Thu, 16 Feb 2012 07:32:15 GMT
Andy hi

Not sure what you mean by "Does something like the below help?" The current
code running is pasted below, line number are sightly different than yours.
It seems very close to the first file (revision "a") in your extract.

Mikael.S

  public Result[] next(final long scannerId, int nbRows) throws IOException
{
    String scannerName = String.valueOf(scannerId);
    InternalScanner s = this.scanners.get(scannerName);
    if (s == null) throw new UnknownScannerException("Name: " +
scannerName);
    try {
      checkOpen();
    } catch (IOException e) {
      // If checkOpen failed, server not running or filesystem gone,
      // cancel this lease; filesystem is gone or we're closing or
something.
      try {
        this.leases.cancelLease(scannerName);
      } catch (LeaseException le) {
        LOG.info("Server shutting down and client tried to access missing
scanner " +
          scannerName);
      }
      throw e;
    }
    Leases.Lease lease = null;
    try {
      // Remove lease while its being processed in server; protects against
case
      // where processing of request takes > lease expiration time.
      lease = this.leases.removeLease(scannerName);
      List<Result> results = new ArrayList<Result>(nbRows);
      long currentScanResultSize = 0;
      List<KeyValue> values = new ArrayList<KeyValue>();
      for (int i = 0; i < nbRows
          && currentScanResultSize < maxScannerResultSize; i++) {
        requestCount.incrementAndGet();
        // Collect values to be returned here
        boolean moreRows = s.next(values);
        if (!values.isEmpty()) {
          for (KeyValue kv : values) {
            currentScanResultSize += kv.heapSize();
          }
          results.add(new Result(values));
        }
        if (!moreRows) {
          break;
        }
        values.clear();
      }
      // Below is an ugly hack where we cast the InternalScanner to be a
      // HRegion.RegionScanner. The alternative is to change InternalScanner
      // interface but its used everywhere whereas we just need a bit of
info
      // from HRegion.RegionScanner, IF its filter if any is done with the
scan
      // and wants to tell the client to stop the scan. This is done by
passing
      // a null result.
      return ((HRegion.RegionScanner) s).isFilterDone() &&
results.isEmpty() ? null
          : results.toArray(new Result[0]);
    } catch (Throwable t) {
      if (t instanceof NotServingRegionException) {
        this.scanners.remove(scannerName);
      }
      throw convertThrowableToIOE(cleanup(t));
    } finally {
      // We're done. On way out readd the above removed lease.  Adding
resets
      // expiration time on lease.
      if (this.scanners.containsKey(scannerName)) {
        if (lease != null) this.leases.addLease(lease);
      }
    }
  }

On Thu, Feb 16, 2012 at 3:10 AM, Andrew Purtell <apurtell@apache.org> wrote:

> Hmm...
>
> Does something like the below help?
>
>
> diff --git
> a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> index f9627ed..0cee8e3 100644
> --- a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> +++ b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
> @@ -2137,11 +2137,7 @@ public class HRegionServer implements
> HRegionInterface, HBaseRPCErrorHandler,
>        }
>        throw e;
>      }
> -    Leases.Lease lease = null;
>      try {
> -      // Remove lease while its being processed in server; protects
> against case
> -      // where processing of request takes > lease expiration time.
> -      lease = this.leases.removeLease(scannerName);
>        List<Result> results = new ArrayList<Result>(nbRows);
>        long currentScanResultSize = 0;
>        List<KeyValue> values = new ArrayList<KeyValue>();
> @@ -2197,10 +2193,9 @@ public class HRegionServer implements
> HRegionInterface, HBaseRPCErrorHandler,
>        }
>        throw convertThrowableToIOE(cleanup(t));
>      } finally {
> -      // We're done. On way out readd the above removed lease.  Adding
> resets
> -      // expiration time on lease.
> +      // We're done. On way out reset expiration time on lease.
>        if (this.scanners.containsKey(scannerName)) {
> -        if (lease != null) this.leases.addLease(lease);
> +        this.leases.renewLease(scannerName);
>        }
>      }
>    }
>
>
>
> Best regards,
>
>     - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>
>
>
> ----- Original Message -----
> > From: Jean-Daniel Cryans <jdcryans@apache.org>
> > To: user@hbase.apache.org
> > Cc:
> > Sent: Wednesday, February 15, 2012 10:17 AM
> > Subject: Re: LeaseException while extracting data via pig/hbase
> integration
> >
> > You would have to grep the lease's id, in your first email it was
> > "-7220618182832784549".
> >
> > About the time it takes to process each row, I meant client (pig) side
> > not in the RS.
> >
> > J-D
> >
> > On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk <mikael.sitruk@gmail.com>
> > wrote:
> >>  Please see answer inline
> >>  Thanks
> >>  Mikael.S
> >>
> >>  On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans
> > <jdcryans@apache.org>wrote:
> >>
> >>>  On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk
> > <mikael.sitruk@gmail.com>
> >>>  wrote:
> >>>  > hi,
> >>>  > Well no, i can't figure out what is the problem, but i saw
> > that someone
> >>>  > else had the same problem (see email: "LeaseException despite
> > high
> >>>  > hbase.regionserver.lease.period")
> >>>  > What can i tell is the following:
> >>>  > Last week the problem was consistent
> >>>  > 1. I updated hbase.regionserver.lease.period=300000 (5 mins),
> > restarted
> >>>  the
> >>>  > cluster and still got the problem, the map got this exception
> > event
> >>>  before
> >>>  > the 5 mins, (some after 1 min and 20 sec)
> >>>
> >>>  That's extremely suspicious. Are you sure the setting is getting
> > picked
> >>>  up? :) I hope so :-)
> >>>
> >>>  You should be able to tell when the lease really expires by simply
> >>>  grepping for the number in the region server log, it should give you a
> >>>  good idea of what your lease period is.
> >>>   greeping on which value? the lease configured here:300000? It does
> not
> >>>  return anything, also tried in current execution where some were ok
> and
> >>>  some were not
> >>>
> >>>  2. The problem occurs only on job that will extract a large number of
> >>>  > columns (>150 cols per row)
> >>>
> >>>  What's your scanner caching set to? Are you spending a lot of time
> >>>  processing each row? from the job configuration generated by pig i can
> > see
> >>>  caching set to 1, regarding the processing time of each row i have no
> > clue
> >>>  how many time it spent. the data for each row is 150 columns of 2k
> > each.
> >>>  This is approx 5 block to bring.
> >>>
> >>>  > 3. The problem never occurred when only 1 map per server is
> > running (i
> >>>  have
> >>>  > 8 CPU with hyper-threaded enabled = 16, so using only 1 map per
> > machine
> >>>  is
> >>>  > just a waste), (at this stage I was thinking perhaps there is a
> >>>  > multi-threaded problem)
> >>>
> >>>  More mappers would pull more data from the region servers so more
> >>>  concurrency from the disks, using more mappers might just slow you
> >>>  down enough that you hit the issue.
> >>>
> >>  Today i ran with 8 mappers and some failed and some didn't (2 of 4),
> > they
> >>  got the lease exception after 5 mins, i will try to check the
> >>  logs/sar/metric files for additional info
> >>
> >>>
> >>>  >
> >>>  > This week i got a sightly different behavior, after having
> > restarted the
> >>>  > servers. The extract were able to ran ok in most of the runs even
> > with 4
> >>>  > maps running (per servers), i got only once the exception but the
> > job was
> >>>  > not killed as other runs last week
> >>>
> >>>  If the client got an UnknownScannerException before the timeout
> >>>  expires (the client also keeps track of it, although it may have a
> >>>  different configuration), it will recreate the scanner.
> >>>
> >>  No this is not the case.
> >>
> >>>
> >>>  Which reminds me, are your regions moving around? If so, and your
> >>>  clients don't know about the high timeout, then they might let the
> >>>  exception pass on to your own code.
> >>>
> >>  Region are presplited ahead, i do not have any region split during the
> run,
> >>  region size is set of 8GB, storefile is around 3.5G
> >>
> >>  The test was run after major compaction, so the number of store file
> is 1
> >>  per RS/family
> >>
> >>
> >>>
> >>>  J-D
> >>>
> >>
> >>
> >>
> >>  --
> >>  Mikael.S
> >
>



-- 
Mikael.S

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message