hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HRegionServer: endless FileNotFoundException in log file
Date Fri, 13 Mar 2009 19:34:58 GMT
Schubert,

HW sounds good enough.

On the "Could not complete write to file", isn't there are "caused by"
clause following it? Hard to tell anything from these excerpts. Can
you grep your logs to see any occurrence of "Too many open files" too?

Finally, if you could drop by on the IRC channel it would be great, we
will help you digging in the logs.

J-D

On Fri, Mar 13, 2009 at 3:25 PM, schubert zhang <zsongbo@gmail.com> wrote:
> Hi J-D,
> Thanks for your guide.
>
> Following is my cluster and hardware information:
>
> Node0:  HDFS Namenode, HBase Master.
> Node1-5: HDFS Datanode, HBase RegionServer
>
> DELL 2950 server, each node have 4GB memory.
> Node0-4 have 1TB disk (0.8TB for HDFS)
> Node5 have 1.5TB disk (1.2TB for HDFS)
>
> I am using hadoop-0.19.1 and hbase-0.19.0.
>
> My table schema is:
> {NAME => 'TESTTAB', IS_ROOT => 'false', IS_META => 'false', FAMILIES =>
> [{NAME => 'cdr', BLOOMFILTER => 'false'
> , VERSIONS => '1', COMPRESSION => 'BLOCK', LENGTH => '2147483647', TTL =>
> '-1', IN_MEMORY => 'false', BLOCKCAC
> HE => 'true'}], INDEXES => []}
> I am using BLOCKCACHE and COMPRESSION now. I am also waiting to
> use BLOOMFILTER in the future (0.20.x)
>
> My region size is 128MB.
>
> At about 70 000 000 rows (each row has about 400B raw data), there are about
> 125 regions. And in HDFS, there is only 16GB (by "hadoop fs -dus
> /hbase/TESTTAB') used. On the HDFS web GUI, there is only 1.x% storage used.
> I think this volume is small for my hardware and storage.
>
> I checked the logs of HDFS, there are some exceptions, but it is so hard
> to associate these issues. I will check more of HDFS and try the
> HBase-0.19.1 RC1. Thanks.
>
> In namenode, there are some exceptions like:
> .16_1236922775003_60020/hlog.dat.1236925456649 because dir.getFileBlocks()
> is null  and pendingFile is null
> 2009-03-13 14:33:03,828 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 9000, call
> complete(/hbase/log_10.24.1.16_1236922775003_60020/hlog.dat.1236925456649,
> DFSClient_-1142813574) from 10.24.1.16:58444: error: java.io.IOException:
> Could not complete write to file
> /hbase/log_10.24.1.16_1236922775003_60020/hlog.dat.1236925456649 by
> DFSClient_-1142813574
> java.io.IOException: Could not complete write to file
> /hbase/log_10.24.1.16_1236922775003_60020/hlog.dat.1236925456649 by
> DFSClient_-1142813574
>        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:378)
>        at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>        at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:894)
>
> In datanode, like:
> 2009-03-13 14:28:45,519 WARN
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.24.1.12:50010, storageID=DS-1393238225-10.24.1.12-50010-1236857766916,
> infoPort=50075, ipcPort=50020):Got exception while serving
> blk_-4945343177244655889_64091 to /10.24.1.10:
> java.io.IOException: Block blk_-4945343177244655889_64091 is not valid.
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:726)
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:714)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> 2009-03-13 14:28:45,519 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
> 10.24.1.12:50010, storageID=DS-1393238225-10.24.1.12-50010-1236857766916,
> infoPort=50075, ipcPort=50020):DataXceiver
> java.io.IOException: Block blk_-4945343177244655889_64091 is not valid.
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.java:726)
>        at
> org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:714)
>        at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:92)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:172)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>
> Schubert
>
> On Fri, Mar 13, 2009 at 10:31 PM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> schubert,
>>
>> Yeah at 70 000 000 rows with only 5 region servers you might expect
>> stuff like that... But this is not good. Can you take a look in your
>> datanodes logs to see if there are any obvious exceptions around the
>> time of the "error closing and deleting..." exception?
>>
>> I'd like to know more about your hardware.
>>
>> Also I strongly suggest that you try out the 0.19.1 RC1, it has really
>> nice fixes.
>>
>> Thx for reporting all that stuff,
>>
>> J-D
>>
>> On Fri, Mar 13, 2009 at 5:41 AM, schubert zhang <zsongbo@gmail.com> wrote:
>> > I have a 5+1 HBase/Hadoop cluster.  (5 region server and 1 master)A table
>> > TESTTAB with only one column family with 36 qualifiers.
>> > A process in the master node use batchUpdate (autoFlush=false) to insert
>> > random rows into this table.
>> > After about 70,000,000 rows inserted, failure. And on the web GUI of
>> HBase,
>> > I can only find 4 regionservers.  Then, I ssh into the missed node to
>> check
>> > the log.
>> >
>> > There is a very big log file (12GB), and when I "tail -f .." this log
>> file,
>> > endless FileNotFoundException is printing, like following:
>> >
>> > 2009-03-13 14:32:21,617 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store
>> file
>> > index size for 310680591/cdr: java.io.FileNotFoundException: File does
>> not
>> > exist:
>> >
>> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
>> > at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,627 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
>> > (Retry: 0) java.io.FileNotFoundException: File does not exist:
>> >
>> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
>> > at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,628 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store
>> file
>> > index size for 310680591/cdr: java.io.FileNotFoundException: File does
>> not
>> > exist:
>> >
>> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
>> > at java.lang.Thread.run(Thread.java:619) 2009-03-13 14:32:21,629 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Processing message
>> > (Retry: 1) java.io.FileNotFoundException: File does not exist:
>> >
>> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
>> > at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
>> > at java.lang.Thread.run(Thread.java:619)
>> >
>> > .......
>> >
>> > 2009-03-13 14:32:21,641 WARN
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: error getting store
>> file
>> > index size for 310680591/cdr: java.io.FileNotFoundException: File does
>> not
>> > exist:
>> >
>> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
>> >        at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.doMetrics(HRegionServer.java:941)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:334)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > 2009-03-13 14:32:21,641 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Exceeded max retries:
>> 10
>> > java.io.FileNotFoundException: File does not exist:
>> >
>> hdfs://nd0-rack0-cloud:9000/hbase/TESTTAB/310680591/cdr/mapfiles/1049684492857034443/index
>> >        at
>> >
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:415)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HStoreFile.indexLength(HStoreFile.java:488)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HStore.getStorefilesIndexSize(HStore.java:2179)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.createRegionLoad(HRegionServer.java:625)
>> >        at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:341)
>> >        at java.lang.Thread.run(Thread.java:619)
>> >
>> > But it will not stop, still output  this exception with "Exceeded max
>> > retries: 10".
>> > ....endless...
>> >
>> > ======================================
>> >
>> > Before this exception, there is no other Exception on this node.
>> > But on other nodes: there are some following:
>> >
>> > 2009-03-13 09:22:19,389 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: error closing and
>> > deleting HLog org.apache.hadoop.ipc.RemoteException: java.io.IOException:
>> > Could not complete write to file
>> > /hbase/log_10.24.1.16_1236857785825_60020/hlog.dat.1236906952707 by
>> > DFSClient_-1142813574 at
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:378)
>> > at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481) at
>> > org.apache.hadoop.ipc.Server$Handler.run(Server.java:894) at
>> > org.apache.hadoop.ipc.Client.call(Client.java:697) at
>> > org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216) at
>> > $Proxy1.complete(Unknown Source) at
>> > sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> >
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>> > at
>> >
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>> > at $Proxy1.complete(Unknown Source) at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3130)
>> > at
>> >
>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3054)
>> > at
>> >
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
>> > at
>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
>> > at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:959)
>> at
>> > org.apache.hadoop.hbase.regionserver.HLog.close(HLog.java:421) at
>> > org.apache.hadoop.hbase.regionserver.HLog.closeAndDelete(HLog.java:404)
>> at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:373)
>> > at java.lang.Thread.run(Thread.java:619)
>> >
>> >
>> > and...
>> >
>> > 2009-03-13 10:03:40,149 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
>> > org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378
>> at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
>> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > 2009-03-13 10:03:40,150 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
>> > org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378
>> at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
>> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > 2009-03-13 10:03:40,150 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
>> > org.apache.hadoop.hbase.NotServingRegionException: TESTTAB,,1236871823378
>> at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
>> > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> >
>> >
>> > and ...
>> >
>> > 2009-03-13 13:01:58,220 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > CDRWAP,13576300780@2009-01-31 10:44:29.720,1236888415970 at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > 2009-03-13 13:01:58,348 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 1 on 60020, call openScanner([B@7a5fe108, [[B@2caf12fc,
>> [B@1a07754f,
>> > 9223372036854775807,
>> > org.apache.hadoop.hbase.filter.WhileMatchRowFilter@5fa6a2e2) from
>> > 10.24.1.10:44605: error:
>> org.apache.hadoop.hbase.NotServingRegionException:
>> > CDRWAP,13576300780@2009-01-31 10:44:29.720,1236888415970
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > CDRWAP,13576300780@2009-01-31 10:44:29.720,1236888415970 at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > 2009-03-13 13:02:00,712 ERROR
>> > org.apache.hadoop.hbase.regionserver.HRegionServer: Failed openScanner
>> > org.apache.hadoop.hbase.NotServingRegionException:
>> > CDRWAP,13576300780@2009-01-31 10:44:29.720,1236888415970 at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2065)
>> > at
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1699)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597) at
>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:632) at
>> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:895)
>> > 2009-03-13 13:02:00,730 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>> Server
>> > handler 8 on 60020, call openScanner([B@1eb57
>> >
>>
>

Mime
View raw message