hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lucian Iordache <lucian.george.iorda...@gmail.com>
Subject Re: Lease does not exist exceptions
Date Tue, 25 Oct 2011 08:15:50 GMT
Yes, I will try to see the SocketTimeoutException after putting log on
debug, because, like it says here
https://issues.apache.org/jira/browse/HBASE-3154 , this is logged on debug
on the client side.

Regards,
Lucian

On Mon, Oct 24, 2011 at 8:22 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> So you should see the SocketTimeoutException in your *client* logs (in
> your case, mappers), not LeaseException. At this point yes you're
> going to timeout, but if you spend so much time cycling on the server
> side then you shouldn't set a high caching configuration on your
> scanner as IO isn't your bottle neck.
>
> J-D
>
> On Mon, Oct 24, 2011 at 10:15 AM, Lucian Iordache
> <lucian.george.iordache@gmail.com> wrote:
> > Hi,
> >
> > The servers have been restarted (I have this configuration for more than
> a
> > month, so this is not the problem).
> > About the stack traces, they show exactly the same, a lot of
> > ClosedChannelConnections and LeaseExceptions.
> >
> > But I found something that could be the problem: hbase.rpc.timeout . This
> > defaults to 60 seconds, and I did not modify it in hbase-site.xml. So it
> > could happen the next way:
> > - the mapper makes a scanner.next call to the region server
> > - the region servers needs more than 60 seconds to execute it (I use
> > multiple filters, and it could take a lot of time)
> > - the scan client gets the timeout and cuts the connection
> > - the region server tries to send the results to the client ==>
> > ClosedChannelConnection
> >
> > I will get a deeper look into it tomorrow. If you have other suggestions,
> > please let me know!
> >
> > Thanks,
> > Lucian
> >
> > On Mon, Oct 24, 2011 at 8:00 PM, Jean-Daniel Cryans <jdcryans@apache.org
> >wrote:
> >
> >> Did you restart the region servers after changing the config?
> >>
> >> Are you sure it's the same exception/stack trace?
> >>
> >> J-D
> >>
> >> On Mon, Oct 24, 2011 at 8:04 AM, Lucian Iordache
> >> <lucian.george.iordache@gmail.com> wrote:
> >> > Hi all,
> >> >
> >> > I have exactly the same problem that Eran had.
> >> > But there is something I don't understand: in my case, I have set the
> >> lease
> >> > time to 240000 (4 minutes). But most of the map tasks that are failing
> >> run
> >> > about 2 minutes. How is it possible to get a LeaseException if the
> task
> >> runs
> >> > less than the configured time for a lease?
> >> >
> >> > Regards,
> >> > Lucian Iordache
> >> >
> >> > On Fri, Oct 21, 2011 at 12:34 AM, Eran Kutner <eran@gigya.com> wrote:
> >> >
> >> >> Perfect! Thanks.
> >> >>
> >> >> -eran
> >> >>
> >> >>
> >> >>
> >> >> On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans <
> jdcryans@apache.org
> >> >> >wrote:
> >> >>
> >> >> > hbase.regionserver.lease.period
> >> >> >
> >> >> > Set it bigger than 60000.
> >> >> >
> >> >> > J-D
> >> >> >
> >> >> > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner <eran@gigya.com>
> wrote:
> >> >> > >
> >> >> > > Thanks J-D!
> >> >> > > Since my main table is expected to continue growing I guess
at
> some
> >> >> point
> >> >> > > even setting the cache size to 1 will not be enough. Is there
a
> way
> >> to
> >> >> > > configure the lease timeout?
> >> >> > >
> >> >> > > -eran
> >> >> > >
> >> >> > >
> >> >> > >
> >> >> > > On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans <
> >> jdcryans@apache.org
> >> >> > >wrote:
> >> >> > >
> >> >> > > > On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner <eran@gigya.com>
> >> >> wrote:
> >> >> > > >
> >> >> > > > > Hi J-D,
> >> >> > > > > Thanks for the detailed explanation.
> >> >> > > > > So if I understand correctly the lease we're talking
about is
> a
> >> >> > scanner
> >> >> > > > > lease and the timeout is between two scanner calls,
correct?
> I
> >> >> think
> >> >> > that
> >> >> > > > > make sense because I now realize that jobs that
fail (some
> jobs
> >> >> > continued
> >> >> > > > > to
> >> >> > > > > fail even after reducing the number of map tasks
as Stack
> >> >> suggested)
> >> >> > use
> >> >> > > > > filters to fetch relatively few rows out of a very
large
> table,
> >> so
> >> >> > they
> >> >> > > > > could be spending a lot of time on the region server
scanning
> >> rows
> >> >> > until
> >> >> > > > it
> >> >> > > > > reached my setCaching value which was 1000. Setting
the
> caching
> >> >> value
> >> >> > to
> >> >> > > > 1
> >> >> > > > > seem to allow these job to complete.
> >> >> > > > > I think it has to be the above, since my rows are
small, with
> >> just
> >> >> a
> >> >> > few
> >> >> > > > > columns and processing them is very quick.
> >> >> > > > >
> >> >> > > >
> >> >> > > > Excellent!
> >> >> > > >
> >> >> > > >
> >> >> > > > >
> >> >> > > > > However, there are still a couple ofw thing I don't
> understand:
> >> >> > > > > 1. What is the difference between setCaching and
setBatch?
> >> >> > > > >
> >> >> > > >
> >> >> > > > * Set the maximum number of values to return for each
call to
> >> next()
> >> >> > > >
> >> >> > > > VS
> >> >> > > >
> >> >> > > > * Set the number of rows for caching that will be passed
to
> >> scanners.
> >> >> > > >
> >> >> > > > The former is useful if you have rows with millions
of columns
> and
> >> >> you
> >> >> > > > could
> >> >> > > > setBatch to get only 1000 of them at a time. You could
call
> that
> >> >> > intra-row
> >> >> > > > scanning.
> >> >> > > >
> >> >> > > >
> >> >> > > > > 2. Examining the region server logs more closely
than I did
> >> >> yesterday
> >> >> > I
> >> >> > > > see
> >> >> > > > > a log of ClosedChannelExceptions in addition to
the expired
> >> leases
> >> >> > (but
> >> >> > > > no
> >> >> > > > > UnknownScannerException), is that expected? You
can see an
> >> excerpt
> >> >> of
> >> >> > the
> >> >> > > > > log from one of the region servers here:
> >> >> > http://pastebin.com/NLcZTzsY
> >> >> > > >
> >> >> > > >
> >> >> > > > It means that when the server got to process that client
> request
> >> and
> >> >> > > > started
> >> >> > > > reading from the socket, the client was already gone.
Killing a
> >> >> client
> >> >> > does
> >> >> > > > that (or killing a MR that scans), so does
> SocketTimeoutException.
> >> >> This
> >> >> > > > should probably go in the book. We should also print
something
> >> nicer
> >> >> :)
> >> >> > > >
> >> >> > > > J-D
> >> >> > > >
> >> >> >
> >> >>
> >> >
> >>
> >
>



-- 
Numai bine,
Lucian

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message