hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HBASE-642) Splitting log in a hostile environment -- bad hdfs -- we drop write-ahead-log edits
Date Sat, 24 May 2008 21:39:56 GMT

     [ https://issues.apache.org/jira/browse/HBASE-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-642:
------------------------

    Summary: Splitting log in a hostile environment -- bad hdfs -- we drop write-ahead-log
edits  (was: Splitting log in a hostile environment -- bad hdfs -- we drop logs)

> Splitting log in a hostile environment -- bad hdfs -- we drop write-ahead-log edits
> -----------------------------------------------------------------------------------
>
>                 Key: HBASE-642
>                 URL: https://issues.apache.org/jira/browse/HBASE-642
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>             Fix For: 0.1.3, 0.2.0
>
>
> The master has noticed that the regionserver that was carrying the .META. region among
others has died and it goes to split its logs:
> {code}
> 2008-05-18 19:58:01,292 DEBUG org.apache.hadoop.hbase.HLog: Splitting 0 of 2: hdfs://domU-12-31-38-00-D4-21:9000/hbase/log_10.254.30.79_1210899434766_60020/hlog.dat.017
> 2008-05-18 19:58:01,408 DEBUG org.apache.hadoop.hbase.HLog: Creating new log file writer
for path hdfs://domU-12-31-38-00-D4-21:9000/hbase/categories/1060231198/oldlogfile.log and
region categories,2864153,1211005494348
> 2008-05-18 19:58:01,573 DEBUG org.apache.hadoop.hbase.HLog: Creating new log file writer
for path hdfs://domU-12-31-38-00-D4-21:9000/hbase/categories/297165731/oldlogfile.log and
region categories,5992242,1211005494349
> {code}
> Master can't write hdfs for some reason so can't do the log split:
> {code}
> 2008-05-18 19:59:15,265 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException: Read timed out
> 2008-05-18 19:59:15,266 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_7852777250062244002
> 2008-05-18 19:59:15,268 INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target
node: 10.252.219.207:50010
> 2008-05-18 19:59:39,945 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
/10.254.30.79:60020. Already tried 6 time(s).
> 2008-05-18 20:00:21,274 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.254.30.79:50010
> 2008-05-18 20:00:21,275 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_7007215478628265924
> 2008-05-18 20:00:21,277 INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target
node: 10.252.219.207:50010
> 2008-05-18 20:00:40,955 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
/10.254.30.79:60020. Already tried 7 time(s).
> 2008-05-18 20:01:31,178 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 10.254.30.79:50010
> 2008-05-18 20:01:31,178 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_2374125514769088471
> 2008-05-18 20:01:31,180 INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target
node: 10.252.219.207:50010
> 2008-05-18 20:01:40,145 INFO org.apache.hadoop.dfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException: Read timed out
> 2008-05-18 20:01:40,145 INFO org.apache.hadoop.dfs.DFSClient: Abandoning block blk_-621042589816139684
> 2008-05-18 20:01:40,148 INFO org.apache.hadoop.dfs.DFSClient: Waiting to find target
node: 10.252.219.207:50010
> ..
> {code}
> Weirdly, the above is complaining can't connect to the datanode running on same host
as master.
> Eventually the split fails with:
> {code}
> 2008-05-18 20:24:28,393 WARN org.apache.hadoop.hbase.HMaster: Processing pending operations:
ProcessServerShutdown of 10.254.30.79:60020
> java.io.IOException: java.io.IOException: Could not complete write to file /hbase/categories/1060231198/oldlogfile.log
by DFSClient_520078809
>     at org.apache.hadoop.dfs.NameNode.complete(NameNode.java:343)
>     at sun.reflect.GeneratedMethodAccessor44.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:901)
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>     at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
>     at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
>     at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
>     at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82)
>     at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1116)
> 2008-05-18 20:24:28,394 DEBUG org.apache.hadoop.hbase.HMaster: Main processing loop:
ProcessServerShutdown of 10.254.30.79:60020
> 2008-05-18 20:24:28,394 INFO org.apache.hadoop.hbase.HMaster: process shutdown of server
10.254.30.79:60020: logSplit: false, rootRescanned: false, numberOfMetaRegions: 1, onlineMetaRegions.size():
1
> 2008-05-18 20:24:28,395 INFO org.apache.hadoop.hbase.HLog: splitting 1 log(s) in hdfs://domU-12-31-38-00-D4-21:9000/hbase/log_10.254.30.79_1210899434766_60020
> 2008-05-18 20:24:28,395 DEBUG org.apache.hadoop.hbase.HLog: Splitting 0 of 1: hdfs://domU-12-31-38-00-D4-21:9000/hbase/log_10.254.30.79_1210899434766_60020/hlog.dat.018
> 2008-05-18 20:24:28,399 WARN org.apache.hadoop.hbase.HLog: Old log file hdfs://domU-12-31-38-00-D4-21:9000/hbase/categories/297165731/oldlogfile.log
already exists. Copying existing file to new file
> 2008-05-18 20:25:03,105 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
/10.254.30.79:60020. Already tried 9 time(s).
> 2008-05-18 20:25:28,401 WARN org.apache.hadoop.hbase.HLog: Exception processing hdfs://domU-12-31-38-00-D4-21:9000/hbase/log_10.254.30.79_1210899434766_60020/hlog.dat.018
-- continuing. Possible DATA LOSS!
> java.net.SocketTimeoutException: timed out waiting for rpc response
> ...
> {code}
> And we just move on to the next log -- we notice the earlier attempt at distributing
the edits and try to pick them up (though in this case, the file is likely empty) -- but the
split of new log also fails.
> Though log says 'Possible DATA LOSS!', we keep going . The .META. and other regions are
reassigned and deployed though they are likely missing edits.
> In this catastrophic case, I'd say master should not move and if it has to, go down rather
than reassign regions and try to keep going.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message