hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matteo Bertozzi <theo.berto...@gmail.com>
Subject Re: export snapshot fail sometime due to LeaseExpiredException
Date Wed, 30 Apr 2014 22:07:53 GMT
can you post your ExportSnapshot.java code?
Is your destination an hbase cluster? if yes do you have HBASE-10766. if
not try to export to an hdfs path (not /hbase subdir)
do you have other stuff playing with the files in .archive? or multiple
ExportSnapshot running against the same set of files?

we have testing for ExportSnapshot with 40G files, so the problem is not on
the size.
It may be one of the above, or your lease timeout too low for the "busy"
state of your machines

Matteo



On Wed, Apr 30, 2014 at 2:55 PM, Tianying Chang <tychang@gmail.com> wrote:

> I think it is not directly caused by the throttle. On the 2nd run on the
> non-throttle jar, the LeaseExpiredException shows up again(for big file).
> So it does seem like the exportSnapshot is not reliable for big file.
>
> The weird thing is when I replace the jar and restart the cluster, the
> first run of the big table always succeed. But then the later run always
> fail with these LeaseExpiredException.  Smaller table has no problem no
> matter how many times I re-run.
>
> Thanks
> Tian-Ying
>
>
> On Wed, Apr 30, 2014 at 2:24 PM, Tianying Chang <tychang@gmail.com> wrote:
>
> > Ted,
> >
> > it seems it is due to the Jira-11083: throttle bandwidth during snapshot
> > export <https://issues.apache.org/jira/browse/HBASE-11083> After I
> revert
> > it back, the job succeed again. It seems even when I set the throttle
> > bandwidth high, like 200M, iftop shows much lower value. Maybe the
> throttle
> > is sleeping longer than it supposed to? But I am not clear why a slow
> copy
> > job can cause LeaseExpiredException. Any idea?
> >
> >
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /hbase/.archive/rich_pin_data_v1/b50ab10bb4812acc2e9fa6c564c9adef/d/bac3c661a897466aaf1706a9e1bd9e9a
> File does not exist. Holder DFSClient_NONMAPREDUCE_-2096088484_1 does not
> have any open files.
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFileInternal(FSNamesystem.java:2454)
> >       at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2431)
> >       at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:536)
> >       at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:335)
> >       at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$
> >
> >
> > Thanks
> > Tian-Ying
> >
> >
> > On Wed, Apr 30, 2014 at 1:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Tianying:
> >> Have you checked audit log on namenode for deletion event corresponding
> to
> >> the files involved in LeaseExpiredException ?
> >>
> >> Cheers
> >>
> >>
> >> On Wed, Apr 30, 2014 at 10:44 AM, Tianying Chang <tychang@gmail.com>
> >> wrote:
> >>
> >> > This time re-run passed (although with many failed/retry tasks) with
> my
> >> > throttle bandwidth as 200M(although by iftop, it never reach close to
> >> that
> >> > number). Is there a way to increase the lease expire time for low
> >> throttle
> >> > bandwidth for individual export job?
> >> >
> >> > Thanks
> >> > Tian-Ying
> >> >
> >> >
> >> >
> >> > On Wed, Apr 30, 2014 at 10:17 AM, Tianying Chang <tychang@gmail.com>
> >> > wrote:
> >> >
> >> > > yes, I am using the bandwidth throttle feature. The export job of
> this
> >> > > table actually succeed for its first run. When I rerun it (for my
> >> robust
> >> > > testing) it seems never pass.  I am wondering if it has some werid
> >> state
> >> > (I
> >> > > did clean up the target cluster even removed
> >> > > /hbase/.archive/rich_pint_data_v1 folder)
> >> > >
> >> > > It seems even if I set the throttle value really large, it still
> fail.
> >> > And
> >> > > I think even after I replace the jar back to the one without
> >> throttle, it
> >> > > still fail for re-run.
> >> > >
> >> > > Is there some way that I can increase the lease to be very large to
> >> test
> >> > > it out?
> >> > >
> >> > >
> >> > >
> >> > > On Wed, Apr 30, 2014 at 10:02 AM, Matteo Bertozzi <
> >> > theo.bertozzi@gmail.com
> >> > > > wrote:
> >> > >
> >> > >> the file is the file in export, so you are creating that file.
> >> > >> do you have the bandwidth throttle on?
> >> > >>
> >> > >> I'm thinking that the file is slow writing: e.g. write(few bytes)
> >> wait
> >> > >> write(few bytes)
> >> > >> and on the wait your lease expire
> >> > >> or something like that can happen if your MR job is stuck in
> someway
> >> > (slow
> >> > >> machine or similar) and it is not writing within the lease timeout
> >> > >>
> >> > >> Matteo
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Wed, Apr 30, 2014 at 9:53 AM, Tianying Chang <tychang@gmail.com
> >
> >> > >> wrote:
> >> > >>
> >> > >> > we are using
> >> > >> >
> >> > >> > Hadoop 2.0.0-cdh4.2.0 and hbase 0.94.7. We also backported
> several
> >> > >> snapshot
> >> > >> > related jira, e.g 10111(verify snapshot), 11083 (bandwidth
> >> throttle in
> >> > >> > exportSnapshot)
> >> > >> >
> >> > >> > I found when the  LeaseExpiredException first reported, that
file
> >> > indeed
> >> > >> > not there, and the map task retry. And I verifified couple
> minutes
> >> > >> later,
> >> > >> > that HFile does exist under /.archive. But the retry map
task
> still
> >> > >> > complain the same error of file  not exist...
> >> > >> >
> >> > >> > I will check the namenode log for the LeaseExpiredException.
> >> > >> >
> >> > >> >
> >> > >> > Thanks
> >> > >> >
> >> > >> > Tian-Ying
> >> > >> >
> >> > >> >
> >> > >> > On Wed, Apr 30, 2014 at 9:33 AM, Ted Yu <yuzhihong@gmail.com>
> >> wrote:
> >> > >> >
> >> > >> > > Can you give us the hbase and hadoop releases you're
using ?
> >> > >> > >
> >> > >> > > Can you check namenode log around the time
> LeaseExpiredException
> >> was
> >> > >> > > encountered ?
> >> > >> > >
> >> > >> > > Cheers
> >> > >> > >
> >> > >> > >
> >> > >> > > On Wed, Apr 30, 2014 at 9:20 AM, Tianying Chang <
> >> tychang@gmail.com>
> >> > >> > wrote:
> >> > >> > >
> >> > >> > > > Hi,
> >> > >> > > >
> >> > >> > > > When I export large table with 460+ regions, I
saw the
> >> > >> exportSnapshot
> >> > >> > job
> >> > >> > > > fail sometime (not all the time). The error of
the map task
> is
> >> > >> below:
> >> > >> > > But I
> >> > >> > > > verified the file highlighted below, it does exist.
Smaller
> >> table
> >> > >> seems
> >> > >> > > > always pass. Any idea? Is it because it is too
big and get
> >> session
> >> > >> > > timeout?
> >> > >> > > >
> >> > >> > > >
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> >> > >> > > > No lease on
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> /hbase/.archive/rich_pin_data_v1/7713d5331180cb610834ba1c4ebbb9b3/d/eef3642f49244547bb6606d4d0f15f1f
> >> > >> > > > File does not exist. Holder
> DFSClient_NONMAPREDUCE_279781617_1
> >> > does
> >> > >> > > > not have any open files.
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2396)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2387)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2183)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:481)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
> >> > >> > > >         at
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> >
> >>
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
> >> > >> > > >         at org.apache.hadoop.ipc.ProtobufR
> >> > >> > > >
> >> > >> > > >
> >> > >> > > >
> >> > >> > > > Thanks
> >> > >> > > >
> >> > >> > > > Tian-Ying
> >> > >> > > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message