hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sela <am...@infolinks.com>
Subject Re: FileNotFoundException in bulk load
Date Tue, 08 Jul 2014 07:38:18 GMT
I think Lars is right. We ended up with errors in the RAID on that
regionserver the next day.

Still, shouldn't HDFS supply one of the replicas ? Why did the audit log
show a successful open, successful rename and then retried to open where it
finally throw the exception ?


On Sun, Jul 6, 2014 at 8:17 PM, lars hofhansl <larsh@apache.org> wrote:

> If we do further discussion there we should reopen the jira.
> Fine if the exception is identical, or open a new one if this is a
> different one.
>
> At first blush this looks a bit like a temporary unavailability of HDFS.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Ted Yu <yuzhihong@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Sunday, July 6, 2014 8:01 AM
> Subject: Re: FileNotFoundException in bulk load
>
>
> The IOExceptions likely came from store.assertBulkLoadHFileOk() call.
>
> HBASE-4030 seems to be better place for future discussion since you can
> attach regionserver log(s) there.
>
> Cheers
>
>
>
> On Sun, Jul 6, 2014 at 5:23 AM, Amit Sela <amits@infolinks.com> wrote:
>
> > Audit log shows that the same regionserver is opening one of the regions,
> > renaming (moving from MR output dir into hbase region directory) and
> trying
> > to open again from MR output dir (repeating 10 times).
> > Open-Rename-10xOpen  appears in that order in the audit log, with a msec
> > difference all in the same region server.
> >
> >
> > On Sun, Jul 6, 2014 at 2:38 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Have you checked audit log from NameNode to see which client deleted
> the
> > > files ?
> > >
> > > Thanks
> > >
>
> > > On Jul 6, 2014, at 4:19 AM, Amit Sela <amits@infolinks.com> wrote:
> > >
> > > > I have a bulk load job running daily for months, when suddenly I got
> > > > a FileNotFoundException.
> > > >
> > > > Googling it I found HBASE-4030 and I noticed someone reporting it
> > started
> > > > to re-appear at 0.94.8.
>
> > > >
> > > > I'm running with Hadoop 1.0.4 and 0.94.12.
> > > >
> > > > Anyone else encountered this problem lately  ?
> > > >
> > > > Re-open the Jira ?
> > > >
> > > > Thanks,
> > > >
> > > > Amit.
> > > >
> > > > *On the client side this is the Excpetion:*
> > > >
> > > > java.net.SocketTimeoutException: Call to
> > > node.xxx.com/xxx.xxx.xxx.xxx:PORT
> > > > failed on socket timeout exception: java.net.SocketTimeoutException:
> > > 60000
> > > > millis timeout while waiting for channel to be ready for read. ch :
> > > > java.nio.channels.SocketChannel[connected
> > > > local=/xxx.xxx.xxx.xxx:PORT remote=node.xxx.com/xxx.xxx.xxx.xxx:PORT
> ]
> > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@29f2a6e3,
> > > > org.apache.hadoop.ipc.RemoteException:
> > > > org.apache.hadoop.io.MultipleIOException: 6 exceptions
> > > > [java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
> > > > at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
> > > > at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > at java.lang.reflect.Method.invoke(Method.java:601)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > >
> > > > *On the regionserver:*
> > > >
> > > > ERROR org.apache.hadoop.hbase.regionserver.HRegion: There were one or
> > > more
> > > > IO errors when checking if the bulk load is ok.
> > > > org.apache.hadoop.io.MultipleIOException: 6 exceptions
> > > > [java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
> > > >        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown
> Source)
> > > >        at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >        at java.lang.reflect.Method.invoke(Method.java:601)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > >
> > > > followed by:
> > > >
> > > > ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > > org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
> call
> > > > next(4522610431482097770, 250), rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753 from x <http://82.80.29.145:51311
> > > >xx.xxx.xxx.xxx
> > > > after 12507 ms, since caller disconnected
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
> > > >        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown
> Source)
> > > >        at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >        at java.lang.reflect.Method.invoke(Method.java:601)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > > 2014-07-06 03:52:14,278 [IPC Server handler 28 on 8041] ERROR
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > > org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
> call
> > > > next(7354511084312054096, 250), rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753 from x
> > > > <http://82.80.29.145:51311/>xx.xxx.xxx.xxx after
> > > > 9476 ms, since caller disconnected
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
> > > >        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown
> Source)
> > > >        at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >        at java.lang.reflect.Method.invoke(Method.java:601)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message