hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Lease does not exist exceptions
Date Wed, 26 Oct 2011 16:53:29 GMT
Did you try setting the scanner caching down like I mentioned?

J-D

On Wed, Oct 26, 2011 at 8:48 AM, Lucian Iordache
<lucian.george.iordache@gmail.com> wrote:
> Problem solved. It was like I said, the server took more than the
> hbase.rpc.timeout to run the call and the client closed the connection.
>
> Best Regards,
> Lucian
>
> On Tue, Oct 25, 2011 at 11:15 AM, Lucian Iordache <
> lucian.george.iordache@gmail.com> wrote:
>
>> Yes, I will try to see the SocketTimeoutException after putting log on
>> debug, because, like it says here
>> https://issues.apache.org/jira/browse/HBASE-3154 , this is logged on debug
>> on the client side.
>>
>> Regards,
>> Lucian
>>
>>
>> On Mon, Oct 24, 2011 at 8:22 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>>
>>> So you should see the SocketTimeoutException in your *client* logs (in
>>> your case, mappers), not LeaseException. At this point yes you're
>>> going to timeout, but if you spend so much time cycling on the server
>>> side then you shouldn't set a high caching configuration on your
>>> scanner as IO isn't your bottle neck.
>>>
>>> J-D
>>>
>>> On Mon, Oct 24, 2011 at 10:15 AM, Lucian Iordache
>>> <lucian.george.iordache@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > The servers have been restarted (I have this configuration for more than
>>> a
>>> > month, so this is not the problem).
>>> > About the stack traces, they show exactly the same, a lot of
>>> > ClosedChannelConnections and LeaseExceptions.
>>> >
>>> > But I found something that could be the problem: hbase.rpc.timeout .
>>> This
>>> > defaults to 60 seconds, and I did not modify it in hbase-site.xml. So it
>>> > could happen the next way:
>>> > - the mapper makes a scanner.next call to the region server
>>> > - the region servers needs more than 60 seconds to execute it (I use
>>> > multiple filters, and it could take a lot of time)
>>> > - the scan client gets the timeout and cuts the connection
>>> > - the region server tries to send the results to the client ==>
>>> > ClosedChannelConnection
>>> >
>>> > I will get a deeper look into it tomorrow. If you have other
>>> suggestions,
>>> > please let me know!
>>> >
>>> > Thanks,
>>> > Lucian
>>> >
>>> > On Mon, Oct 24, 2011 at 8:00 PM, Jean-Daniel Cryans <
>>> jdcryans@apache.org>wrote:
>>> >
>>> >> Did you restart the region servers after changing the config?
>>> >>
>>> >> Are you sure it's the same exception/stack trace?
>>> >>
>>> >> J-D
>>> >>
>>> >> On Mon, Oct 24, 2011 at 8:04 AM, Lucian Iordache
>>> >> <lucian.george.iordache@gmail.com> wrote:
>>> >> > Hi all,
>>> >> >
>>> >> > I have exactly the same problem that Eran had.
>>> >> > But there is something I don't understand: in my case, I have set
the
>>> >> lease
>>> >> > time to 240000 (4 minutes). But most of the map tasks that are
>>> failing
>>> >> run
>>> >> > about 2 minutes. How is it possible to get a LeaseException if
the
>>> task
>>> >> runs
>>> >> > less than the configured time for a lease?
>>> >> >
>>> >> > Regards,
>>> >> > Lucian Iordache
>>> >> >
>>> >> > On Fri, Oct 21, 2011 at 12:34 AM, Eran Kutner <eran@gigya.com>
>>> wrote:
>>> >> >
>>> >> >> Perfect! Thanks.
>>> >> >>
>>> >> >> -eran
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Oct 20, 2011 at 23:27, Jean-Daniel Cryans <
>>> jdcryans@apache.org
>>> >> >> >wrote:
>>> >> >>
>>> >> >> > hbase.regionserver.lease.period
>>> >> >> >
>>> >> >> > Set it bigger than 60000.
>>> >> >> >
>>> >> >> > J-D
>>> >> >> >
>>> >> >> > On Thu, Oct 20, 2011 at 2:23 PM, Eran Kutner <eran@gigya.com>
>>> wrote:
>>> >> >> > >
>>> >> >> > > Thanks J-D!
>>> >> >> > > Since my main table is expected to continue growing
I guess at
>>> some
>>> >> >> point
>>> >> >> > > even setting the cache size to 1 will not be enough.
Is there a
>>> way
>>> >> to
>>> >> >> > > configure the lease timeout?
>>> >> >> > >
>>> >> >> > > -eran
>>> >> >> > >
>>> >> >> > >
>>> >> >> > >
>>> >> >> > > On Thu, Oct 20, 2011 at 23:16, Jean-Daniel Cryans
<
>>> >> jdcryans@apache.org
>>> >> >> > >wrote:
>>> >> >> > >
>>> >> >> > > > On Wed, Oct 19, 2011 at 12:51 PM, Eran Kutner
<eran@gigya.com
>>> >
>>> >> >> wrote:
>>> >> >> > > >
>>> >> >> > > > > Hi J-D,
>>> >> >> > > > > Thanks for the detailed explanation.
>>> >> >> > > > > So if I understand correctly the lease
we're talking about
>>> is a
>>> >> >> > scanner
>>> >> >> > > > > lease and the timeout is between two scanner
calls, correct?
>>> I
>>> >> >> think
>>> >> >> > that
>>> >> >> > > > > make sense because I now realize that jobs
that fail (some
>>> jobs
>>> >> >> > continued
>>> >> >> > > > > to
>>> >> >> > > > > fail even after reducing the number of
map tasks as Stack
>>> >> >> suggested)
>>> >> >> > use
>>> >> >> > > > > filters to fetch relatively few rows out
of a very large
>>> table,
>>> >> so
>>> >> >> > they
>>> >> >> > > > > could be spending a lot of time on the
region server
>>> scanning
>>> >> rows
>>> >> >> > until
>>> >> >> > > > it
>>> >> >> > > > > reached my setCaching value which was 1000.
Setting the
>>> caching
>>> >> >> value
>>> >> >> > to
>>> >> >> > > > 1
>>> >> >> > > > > seem to allow these job to complete.
>>> >> >> > > > > I think it has to be the above, since my
rows are small,
>>> with
>>> >> just
>>> >> >> a
>>> >> >> > few
>>> >> >> > > > > columns and processing them is very quick.
>>> >> >> > > > >
>>> >> >> > > >
>>> >> >> > > > Excellent!
>>> >> >> > > >
>>> >> >> > > >
>>> >> >> > > > >
>>> >> >> > > > > However, there are still a couple ofw thing
I don't
>>> understand:
>>> >> >> > > > > 1. What is the difference between setCaching
and setBatch?
>>> >> >> > > > >
>>> >> >> > > >
>>> >> >> > > > * Set the maximum number of values to return
for each call to
>>> >> next()
>>> >> >> > > >
>>> >> >> > > > VS
>>> >> >> > > >
>>> >> >> > > > * Set the number of rows for caching that will
be passed to
>>> >> scanners.
>>> >> >> > > >
>>> >> >> > > > The former is useful if you have rows with millions
of columns
>>> and
>>> >> >> you
>>> >> >> > > > could
>>> >> >> > > > setBatch to get only 1000 of them at a time.
You could call
>>> that
>>> >> >> > intra-row
>>> >> >> > > > scanning.
>>> >> >> > > >
>>> >> >> > > >
>>> >> >> > > > > 2. Examining the region server logs more
closely than I did
>>> >> >> yesterday
>>> >> >> > I
>>> >> >> > > > see
>>> >> >> > > > > a log of ClosedChannelExceptions in addition
to the expired
>>> >> leases
>>> >> >> > (but
>>> >> >> > > > no
>>> >> >> > > > > UnknownScannerException), is that expected?
You can see an
>>> >> excerpt
>>> >> >> of
>>> >> >> > the
>>> >> >> > > > > log from one of the region servers here:
>>> >> >> > http://pastebin.com/NLcZTzsY
>>> >> >> > > >
>>> >> >> > > >
>>> >> >> > > > It means that when the server got to process
that client
>>> request
>>> >> and
>>> >> >> > > > started
>>> >> >> > > > reading from the socket, the client was already
gone. Killing
>>> a
>>> >> >> client
>>> >> >> > does
>>> >> >> > > > that (or killing a MR that scans), so does
>>> SocketTimeoutException.
>>> >> >> This
>>> >> >> > > > should probably go in the book. We should also
print something
>>> >> nicer
>>> >> >> :)
>>> >> >> > > >
>>> >> >> > > > J-D
>>> >> >> > > >
>>> >> >> >
>>> >> >>
>>> >> >
>>> >>
>>> >
>>>
>>
>>
>>
>> --
>> Numai bine,
>> Lucian
>>
>
>
>
> --
> Numai bine,
> Lucian
>

Mime
View raw message