hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: FileNotFoundException in bulk load
Date Tue, 08 Jul 2014 08:26:30 GMT
HDFS in Hadoop 1.0 only times out a bad DataNodes after 20 minutes by default. Until that time
the NameNode will happily direct requests to a bad DataNode and each request then has to time
out individually.

So after a while (or a clean restart of everything) this should have fixed itself. Did it?


In Hadoop 2.0 (or 2.2?) our own Nicolas Liochon has another state to DataNodes where after
30s (or so) of unreachability the NameNode no longer directs request to such DataNodes unless
no other DataNode is available for the block in question.

-- Lars



________________________________
 From: Amit Sela <amits@infolinks.com>
To: user@hbase.apache.org; lars hofhansl <larsh@apache.org> 
Sent: Tuesday, July 8, 2014 9:38 AM
Subject: Re: FileNotFoundException in bulk load
 

I think Lars is right. We ended up with errors in the RAID on that
regionserver the next day.

Still, shouldn't HDFS supply one of the replicas ? Why did the audit log
show a successful open, successful rename and then retried to open where it
finally throw the exception ?





On Sun, Jul 6, 2014 at 8:17 PM, lars hofhansl <larsh@apache.org> wrote:

> If we do further discussion there we should reopen the jira.
> Fine if the exception is identical, or open a new one if this is a
> different one.
>
> At first blush this looks a bit like a temporary unavailability of HDFS.
>
>
> -- Lars
>
>
>
> ________________________________
>  From: Ted Yu <yuzhihong@gmail.com>
> To: "user@hbase.apache.org" <user@hbase.apache.org>
> Sent: Sunday, July 6, 2014 8:01 AM
> Subject: Re: FileNotFoundException in bulk load
>
>
> The IOExceptions likely came from store.assertBulkLoadHFileOk() call.
>
> HBASE-4030 seems to be better place for future discussion since you can
> attach regionserver log(s) there.
>
> Cheers
>
>
>
> On Sun, Jul 6, 2014 at 5:23 AM, Amit Sela <amits@infolinks.com> wrote:
>
> > Audit log shows that the same regionserver is opening one of the regions,
> > renaming (moving from MR output dir into hbase region directory) and
> trying
> > to open again from MR output dir (repeating 10 times).
> > Open-Rename-10xOpen  appears in that order in the audit log, with a msec
> > difference all in the same region server.
> >
> >
> > On Sun, Jul 6, 2014 at 2:38 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Have you checked audit log from NameNode to see which client deleted
> the
> > > files ?
> > >
> > > Thanks
> > >
>
> > > On Jul 6, 2014, at 4:19 AM, Amit Sela <amits@infolinks.com> wrote:
> > >
> > > > I have a bulk load job running daily for months, when suddenly I got
> > > > a FileNotFoundException.
> > > >
> > > > Googling it I found HBASE-4030 and I noticed someone reporting it
> > started
> > > > to re-appear at 0.94.8.
>
> > > >
> > > > I'm running with Hadoop 1.0.4 and 0.94.12.
> > > >
> > > > Anyone else encountered this problem lately  ?
> > > >
> > > > Re-open the Jira ?
> > > >
> > > > Thanks,
> > > >
> > > > Amit.
> > > >
> > > > *On the client side this is the Excpetion:*
> > > >
> > > > java.net.SocketTimeoutException: Call to
> > > node.xxx.com/xxx.xxx.xxx.xxx:PORT
> > > > failed on socket timeout exception: java.net.SocketTimeoutException:
> > > 60000
> > > > millis timeout while waiting for channel to be ready for read. ch :
> > > > java.nio.channels.SocketChannel[connected
> > > > local=/xxx.xxx.xxx.xxx:PORT remote=node.xxx.com/xxx.xxx.xxx.xxx:PORT
> ]
> > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@29f2a6e3,
> > > > org.apache.hadoop.ipc.RemoteException:
> > > > org.apache.hadoop.io.MultipleIOException: 6 exceptions
> > > > [java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
> > > > at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
> > > > at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > > at java.lang.reflect.Method.invoke(Method.java:601)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > > at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > >
> > > > *On the regionserver:*
> > > >
> > > > ERROR org.apache.hadoop.hbase.regionserver.HRegion: There were one or
> > > more
> > > > IO errors when checking if the bulk load is ok.
> > > > org.apache.hadoop.io.MultipleIOException: 6 exceptions
> > > > [java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
> > > > java.io.FileNotFoundException: File does not exist:
> > > >
> > >
> >
> /data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
> > > >        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown
> Source)
> > > >        at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >        at java.lang.reflect.Method.invoke(Method.java:601)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > >
> > > > followed by:
> > > >
> > > > ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > > org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
> call
> > > > next(4522610431482097770, 250), rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753 from x <http://82.80.29.145:51311
> > > >xx.xxx.xxx.xxx
> > > > after 12507 ms, since caller disconnected
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
> > > >        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown
> Source)
> > > >        at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >        at java.lang.reflect.Method.invoke(Method.java:601)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > > > 2014-07-06 03:52:14,278 [IPC Server handler 28 on 8041] ERROR
> > > > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > > > org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting
> call
> > > > next(7354511084312054096, 250), rpc version=1, client version=29,
> > > > methodsFingerPrint=-1368823753 from x
> > > > <http://82.80.29.145:51311/>xx.xxx.xxx.xxx after
> > > > 9476 ms, since caller disconnected
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
> > > >        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown
> Source)
> > > >        at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > >        at java.lang.reflect.Method.invoke(Method.java:601)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
> > > >        at
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> > >
> >
>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message