hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4030) LoadIncrementalHFiles fails with FileNotFoundException
Date Sun, 06 Jul 2014 15:12:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053136#comment-14053136
] 

Amit Sela commented on HBASE-4030:
----------------------------------

Happened to me in 0.94.12 - I can't seem to reproduce.
It happened "out of nowhere" - no change made in application/infrastructure in a while.
Audit log shows that the same regionserver is opening one of the regions, renaming (moving
from MR output dir into hbase region directory) and trying to open again from MR output dir
(repeating 10 times).
Open-Rename-10xOpen  appears in that order in the audit log, with a msec difference all in
the same region server.

Logs:
On the client side this is the Excpetion:
java.net.SocketTimeoutException: Call to node.xxx.com/xxx.xxx.xxx.xxx:PORT failed on socket
timeout exception: java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xxx.xxx.xxx.xxx:PORT
remote=node.xxx.com/xxx.xxx.xxx.xxx:PORT]
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@29f2a6e3, org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.io.MultipleIOException: 6 exceptions [java.io.FileNotFoundException: File
does not exist: /data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
	at org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
	at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
	at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
	at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:601)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

On the regionserver:

ERROR org.apache.hadoop.hbase.regionserver.HRegion: There were one or more IO errors when
checking if the bulk load is ok.
org.apache.hadoop.io.MultipleIOException: 6 exceptions [java.io.FileNotFoundException: File
does not exist: /data/output_jobs/output_websites/HFiles_20140705/metadata/88fd743853cf4f8a862fb19646027a48,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen/31c4c5cea9b348dbb6bb94115a483877,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen/5762c45aaf4f408ba748a989f7be9647,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen1/2ee02a005b654704a092d16c5c713373,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/gen1/618251330a1842a797de4b304d341a02,
java.io.FileNotFoundException: File does not exist: /data/output_jobs/output_websites/HFiles_20140705/metadata/3955039392ce4f49aee5f58218a61be1]
        at org.apache.hadoop.io.MultipleIOException.createIOException(MultipleIOException.java:47)
        at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3673)
        at org.apache.hadoop.hbase.regionserver.HRegion.bulkLoadHFiles(HRegion.java:3622)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.bulkLoadHFiles(HRegionServer.java:2930)
        at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)

followed by:

ERROR org.apache.hadoop.hbase.regionserver.HRegionServer: 
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call next(4522610431482097770,
250), rpc version=1, client version=29, methodsFingerPrint=-1368823753 from xxx.xxx.xxx.xxx
after 12507 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
2014-07-06 03:52:14,278 [IPC Server handler 28 on 8041] ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:

org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call next(7354511084312054096,
250), rpc version=1, client version=29, methodsFingerPrint=-1368823753 from xxx.xxx.xxx.xxx
after 9476 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3980)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3890)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3880)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:2648)
        at sun.reflect.GeneratedMethodAccessor60.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:601)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) 

> LoadIncrementalHFiles fails with FileNotFoundException
> ------------------------------------------------------
>
>                 Key: HBASE-4030
>                 URL: https://issues.apache.org/jira/browse/HBASE-4030
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.1
>         Environment: CDH3bu on Ubuntu 4.4.3
>            Reporter: Adam Phelps
>
> -- We've been seeing intermittent failures of calls to LoadIncrementalHFiles.  When this
happens the node that made the call will see a FileNotFoundException such as this:
> 2011-06-23 15:47:34.379566500 java.net.SocketTimeoutException: Call to s8.XXX/67.215.90.38:60020
failed on socket timeout exception: java.net.SocketTi
> meoutException: 60000 millis timeout while waiting for channel to be ready for read.
ch : java.nio.channels.SocketChannel[connected local=/67.215.90.51:51605 remo
> te=s8.XXX/67.215.90.38:60020]
> 2011-06-23 15:47:34.379570500 java.io.FileNotFoundException: java.io.FileNotFoundException:
File does not exist: /hfiles/2011/06/23/14/domainsranked/TopDomainsRan
> k.r3v5PRvK/handling/3557032074765091256
> 2011-06-23 15:47:34.379573500   at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1602)
> 2011-06-23 15:47:34.379573500   at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.<init>(DFSClient.java:1593)
> -- Over on the regionserver that was loading this we see that it attempted to load and
hit a 60 second timeout:
> 2011-06-23 15:45:54,634 INFO org.apache.hadoop.hbase.regionserver.Store: Validating hfile
at hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
for inclusion in store handling region domainsranked,368449:2011/0/03/23:category:ffffffff:com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b
> 8.
> ...
> 2011-06-23 15:46:54,639 INFO org.apache.hadoop.hdfs.DFSClient: Failed to connect to /67.215.90.38:50010,
add to deadNodes and continue
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be
ready for read. ch : java.nio.channels.SocketChannel[connected local=/67.215.90.38:42199 remote=/67.215.90.38:50010]
> at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> at java.io.DataInputStream.readShort(DataInputStream.java:295)
> -- We suspect this particular problem is a resource contention issue on our side.  However,
the loading process proceeds to rename the file despite the failure:
> 2011-06-23 15:46:54,657 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming bulk
load file hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
to hdfs://namenode.XXX:8020/hbase/domainsranked/d4925aca7852bed32613a509215d42b8/handling/3615917062821145533
> -- And then the LoadIncrementalHFiles tries to load the hfile again:
> 2011-06-23 15:46:55,684 INFO org.apache.hadoop.hbase.regionserver.Store: Validating hfile
at hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
for inclusion in store handling region domainsranked,368449:2011/05/03/23:category:ffffffff:com.zynga.static.fishville.facebook,1305890318961.d4925aca7852bed32613a509215d42b8.
> 2011-06-23 15:46:55,685 DEBUG org.apache.hadoop.ipc.HBaseServer: IPC Server handler 147
on 60020, call bulkLoadHFile(hdfs://namenode.XXX/hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256,
[B@4224508b, [B@5e23f799) from 67.215.90.51:51856: error: java.io.FileNotFoundException: File
does not exist: /hfiles/2011/06/23/14/domainsranked/TopDomainsRank.r3v5PRvK/handling/3557032074765091256
> -- This eventually leads to the load command failing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message