hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7878) recoverFileLease does not check return value of recoverLease
Date Tue, 26 Feb 2013 20:58:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587536#comment-13587536
] 

Ted Yu commented on HBASE-7878:
-------------------------------

By directly injecting IOE at the location of calling recoverLease, I was able to trigger append()
call.
Here is snippet from test output:
{code}
2013-02-26 12:50:28,110 DEBUG [IPC Server handler 2 on 55601] namenode.FSEditLog$EditLogFileOutputStream(268):
Preallocated 1048576 bytes at the end of the edit log (offset 4)
2013-02-26 12:50:28,113 INFO  [IPC Server handler 2 on 55601] namenode.FSNamesystem(2458):
commitBlockSynchronization(newblock=blk_-4984910605561849777_1003, file=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362,
newgenerationstamp=1003, newlength=2634, newtargets=[127.0.0.1:55695, 127.0.0.1:55698, 127.0.0.1:55701])
successful
2013-02-26 12:50:28,155 INFO  [Thread-227] util.FSHDFSUtils(70): Recovering file hdfs://localhost:55601/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362
2013-02-26 12:50:28,155 DEBUG [Thread-227] util.FSHDFSUtils(90): Failed fs.recoverLease invocation,
java.io.IOException, trying fs.append instead
2013-02-26 12:50:28,158 INFO  [IPC Server handler 7 on 55601] namenode.FSNamesystem(169):
ugi=tyu ip=/127.0.0.1 cmd=append  src=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362
dst=null  perm=null
2013-02-26 12:50:28,159 DEBUG [Thread-227] hdfs.DFSClient$DFSOutputStream(3516): computePacketChunkSize:
src=/user/tyu/hbase/TestHLog/hlogdir/hlog.1361911780362, chunkSize=442, chunksPerPacket=1,
packetSize=467
2013-02-26 12:50:28,159 DEBUG [Thread-227] hdfs.DFSClient(189): Connecting to 127.0.0.1:55697
2013-02-26 12:50:28,162 INFO  [IPC Server handler 0 on 55697] datanode.DataNode(2130): Client
calls recoverBlock(block=blk_-4984910605561849777_1003, targets=[127.0.0.1:55695, 127.0.0.1:55698,
127.0.0.1:55701])
2013-02-26 12:50:28,163 DEBUG [IPC Server handler 0 on 55697] datanode.FSDataset(2143): Interrupting
active writer threads for block blk_-4984910605561849777_1003
2013-02-26 12:50:28,163 DEBUG [IPC Server handler 0 on 55697] datanode.FSDataset(2159): getBlockMetaDataInfo
successful block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
2013-02-26 12:50:28,165 DEBUG [IPC Server handler 1 on 55700] datanode.FSDataset(2143): Interrupting
active writer threads for block blk_-4984910605561849777_1003
2013-02-26 12:50:28,165 DEBUG [IPC Server handler 1 on 55700] datanode.FSDataset(2159): getBlockMetaDataInfo
successful block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
2013-02-26 12:50:28,166 DEBUG [IPC Server handler 1 on 55703] datanode.FSDataset(2143): Interrupting
active writer threads for block blk_-4984910605561849777_1003
2013-02-26 12:50:28,166 DEBUG [IPC Server handler 1 on 55703] datanode.FSDataset(2159): getBlockMetaDataInfo
successful block=blk_-4984910605561849777_1003 length 2634 genstamp 1003
{code}
TestHLog#testAppendClose passed using hadoop 1.0.

It is not clear to me at the moment how the IOE can be injected in the test without such hack.
                
> recoverFileLease does not check return value of recoverLease
> ------------------------------------------------------------
>
>                 Key: HBASE-7878
>                 URL: https://issues.apache.org/jira/browse/HBASE-7878
>             Project: HBase
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.95.0, 0.94.6
>            Reporter: Eric Newton
>            Assignee: Ted Yu
>            Priority: Critical
>             Fix For: 0.95.0, 0.94.6
>
>         Attachments: 7878-trunk-v2.txt, 7878-trunk-v3.txt, 7878-trunk-v4.txt
>
>
> I think this is a problem, so I'm opening a ticket so an HBase person takes a look.
> Apache Accumulo has moved its write-ahead log to HDFS. I modeled the lease recovery for
Accumulo after HBase's lease recovery.  During testing, we experienced data loss.  I found
it is necessary to wait until recoverLease returns true to know that the file has been truly
closed.  In FSHDFSUtils, the return result of recoverLease is not checked. In the unit tests
created to check lease recovery in HBASE-2645, the return result of recoverLease is always
checked.
> I think FSHDFSUtils should be modified to check the return result, and wait until it
returns true.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message