hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: NotReplicatedYetException when gracefully shutting down a region server.
Date Tue, 03 Jan 2012 22:06:03 GMT
Since you did the region mover thing then all the region should be off
from that region server, so data loss is highly unlikely.

My theory would be that a compaction was underway and got cancelled
when the region server was told to close the region, but the
DFSOutputStream thread hasn't been cleanup yet. Probably the file was
even already deleted and the data is still safe in the un-compacted
files.

Ways to confirm this:

 - Look in the region server log in the preceding lines what happens
to that region (named 0fd5a7b472fb092a4470e9505c3a421a) when it's
closing. Do you see a compaction being cancelled?
 - Look in the Namenode log for that file, is it being asked to delete
it or something before it sends a NotReplicatedYetException?

Hope this helps,

J-D

On Thu, Dec 29, 2011 at 3:11 PM, Alok Singh <alok@urbanairship.com> wrote:
> When attempting to gracefully shutdown a regionserver, I saw a couple
> of NotReplicatedYet exceptions in the logs (below). Can't find the
> file that is causing this exception in on the HDFS filesystem. Have we
> potentially lost the data, or is this exception benign?
>
>
> Alok
>
> hbase: 0.90.3
> hadoop: 0.20.2-cdh3u0
>
> Gracefull shutdown process:
> hbase(main):001:0> balance_switch false
>
> hbase org.jruby.Main current/bin/region_mover.rb unload <IP>
>
> After the region count for the regionserver is 0, we kill the
> regionserver process.
>
> ------------------Log-------------------------
> 2011-12-29 17:11:03,768 - INFO  [PRI IPC Server handler 6 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:03,787 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,803 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:03,911 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990):
> 7.1k, 5225/5276 (99%)
> 2011-12-29 17:11:03,944 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/.tmp/6825595110476022990
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965
> 2011-12-29 17:11:03,961 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_device/01c7894eea30fcd6713b159ca1645da5/device/1580164973134104965,
> entries=5276, sequenceid=1219931822, memsize=2.0m, filesize=384.4k
> 2011-12-29 17:11:03,964 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-0:HRegion@554] -
> Closed rich_push.user_device,ujoG9VuxSbKTLOQalFdJDA.c9QrCxywSJWvSfYhDGL1XA,1324711834779.01c7894eea30fcd6713b159ca1645da5.
> 2011-12-29 17:11:05,706 - INFO  [PRI IPC Server handler 7 on
> 7040:HRegionServer@2142] - Received close region:
> rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:11:05,716 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,735 - WARN
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:FsPermission@220] -
> dfs.umask configuration key is deprecated. Convert to dfs.umaskmode,
> using octal or symbolic umask specifications.
> 2011-12-29 17:11:05,808 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Writer@868]
> - Bloom added to HFile
> (hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097):
> 6.5k, 4727/4737 (100%)
> 2011-12-29 17:11:05,834 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@494] -
> Renaming flushed file at
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/.tmp/1207401926740467097
> to hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:StoreFile$Reader@1027]
> - Loaded col bloom filter metadata for
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492
> 2011-12-29 17:11:05,852 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:Store@504] - Added
> hdfs://richpush-master-0:7080/richpush/hbase/root/rich_push.user_udid/6f4cf99ff967f159fc68cd6c05001e9a/udid/8492436804961336492,
> entries=4737, sequenceid=1219931829, memsize=1.4m, filesize=246.5k
> 2011-12-29 17:11:05,856 - INFO
> [RS_CLOSE_REGION-10.129.1.230,7040,1323990853621-1:HRegion@554] -
> Closed rich_push.user_udid,7sDqU74XQN6x1LWFPsmomQ.4e77ba299bf7e812c5002186,1325081217300.6f4cf99ff967f159fc68cd6c05001e9a.
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@100] -
> Shutdown hook starting; hbase.shutdown.hook=true;
> fsShutdownHook=Thread[Thread-15,5,main]
> 2011-12-29 17:14:58,942 - INFO
> [Shutdownhook:regionserver7040:HRegionServer@1342] - STOPPED: Shutdown
> hook
> 2011-12-29 17:14:59,476 - INFO  [regionserver7040:HBaseServer@1234] -
> Stopping server on 7040
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 2 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 9 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 4 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 4 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 4 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 5 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 6 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 7 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 8 on 7040: exiting
> 2011-12-29 17:14:59,477 - INFO  [PRI IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 0 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 1 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 3 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 5 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 5 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 6 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 6 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 7 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 7 on 7040:
> exiting
> 2011-12-29 17:14:59,478 - INFO  [PRI IPC Server handler 9 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 9 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 0 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 0 on 7040: exiting
> 2011-12-29 17:14:59,480 - INFO  [IPC Server
> Responder:HBaseServer$Responder@649] - Stopping IPC Server Responder
> 2011-12-29 17:14:59,479 - INFO  [IPC Server listener on
> 7040:HBaseServer$Listener@450] - Stopping IPC Server listener on 7040
> 2011-12-29 17:14:59,477 - INFO  [regionserver7040:HRegionServer@636] -
> Stopping infoServer
> 2011-12-29 17:14:59,476 - INFO  [PRI IPC Server handler 8 on
> 7040:HBaseServer$Handler@1104] - PRI IPC Server handler 8 on 7040:
> exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 3 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 3 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 2 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 2 on 7040: exiting
> 2011-12-29 17:14:59,476 - INFO  [IPC Server handler 1 on
> 7040:HBaseServer$Handler@1104] - IPC Server handler 1 on 7040: exiting
> 2011-12-29 17:14:59,481 - INFO  [regionserver7040:Slf4jLog@67] -
> Stopped SelectChannelConnector@0.0.0.0:7041
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logRoller:LogRoller@114] - LogRoller exiting.
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.cacheFlusher:MemStoreFlusher@266] -
> regionserver7040.cacheFlusher exiting
> 2011-12-29 17:14:59,517 - INFO
> [regionserver7040.logSyncer:HLog$LogSyncer@966] -
> regionserver7040.logSyncer exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.majorCompactionChecker:Chore@79] -
> regionserver7040.majorCompactionChecker exiting
> 2011-12-29 17:14:59,518 - INFO
> [regionserver7040.compactor:CompactSplitThread@113] -
> regionserver7040.compactor exiting
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:HRegionServer@668] -
> stopping server at: 10.129.1.230,7040,1323990853621
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@124] -
> regionserver7040 closing leases
> 2011-12-29 17:14:59,891 - INFO  [regionserver7040:Leases@131] -
> regionserver7040 closed leases
> 2011-12-29 17:14:59,891 - INFO
> [regionserver7040:HConnectionManager$HConnectionImplementation@1067] -
> Closed zookeeper sessionid=0x4343f266cbc0012
> 2011-12-29 17:14:59,894 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x4343f266cbc0012 closed
> 2011-12-29 17:14:59,894 - INFO
> [main-EventThread:ClientCnxn$EventThread@520] - EventThread shut down
> 2011-12-29 17:14:59,897 - INFO  [regionserver7040:ZooKeeper@538] -
> Session: 0x1343f2669140013 closed
> 2011-12-29 17:14:59,898 - INFO  [regionserver7040:HRegionServer@686] -
> regionserver7040 exiting
> 2011-12-29 17:14:59,899 - INFO
> [regionserver7040-EventThread:ClientCnxn$EventThread@520] -
> EventThread shut down
> 2011-12-29 17:14:59,900 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@106] -
> Starting fs shutdown hook thread.
> 2011-12-29 17:14:59,900 - ERROR
> [Thread-15:DFSClient$LeaseChecker@1135] - Exception closing file
> /richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
> : org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException: Not
> replicated yet:/richpush/hbase/root/rich_push.user/0fd5a7b472fb092a4470e9505c3a421a/.tmp/6475368557525722423
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1455)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
>        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)
>
>        at org.apache.hadoop.ipc.Client.call(Client.java:1104)
>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
>        at $Proxy5.addBlock(Unknown Source)
>        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
>        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>        at $Proxy5.addBlock(Unknown Source)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
>        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)
> 2011-12-29 17:14:59,913 - INFO
> [Shutdownhook:regionserver7040:ShutdownHook$ShutdownHookThread@112] -
> Shutdown hook finished.
> ------------------------

Mime
View raw message