hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "angerszhu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14437) Exception happened when rollEditLog expects empty EditsDoubleBuffer.bufCurrent but not
Date Fri, 19 Apr 2019 09:17:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16821795#comment-16821795
] 

angerszhu commented on HDFS-14437:
----------------------------------

[~hexiaoqiao] [~daryn] [~starphin] [~kihwal] [~arpitagarwal]

 

When I do Unit test , I add some log in my code and print thread id in log4j.

In TestEditLog#Transactions's run method, I add a call for #rollEditLog to simulate random
multithreading call of logSync().

 
{code:java}
@Override
public void run() {
 PermissionStatus p = namesystem.createFsOwnerPermissions(
 new FsPermission((short)0777));
 FSEditLog editLog = namesystem.getEditLog();

 for (int i = 0; i < numTransactions; i++) {
 INodeFile inode = new INodeFile(namesystem.allocateNewInodeId(), null,
 p, 0L, 0L, BlockInfo.EMPTY_ARRAY, replication, blockSize, (byte)0);
 inode.toUnderConstruction("", "");

 editLog.logOpenFile("/filename" + (startIndex + i), inode, false, false);
 editLog.logCloseFile("/filename" + (startIndex + i), inode);
 editLog.logSync();
 if(i % 100 ==0){
 try {
 editLog.rollEditLog();
 } catch (IOException e) {
 e.printStackTrace();
 }
 }
 }
}

{code}
 

and I add some Log of these there transaction id: 
 * txid  : Current max transaction id of all thread
 * myTransactionId : max transaction id of current thread
 * syncTxId : current transaction if of EditLog that has been synced to QJM

The Right #rollEditLog Event: see thread 
{code:java}
TransactionThread-267{code}
 
{code:java}
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Rolling edit logs
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Ending log segment
829
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When endCurrentLogSegment
was called : txid = 1286
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When endCurrentLogSegment
was called : myTransationId = 830
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When endCurrentLogSegment
was called : synctxid = 830
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logEdit, OpCode
OP_END_LOG_SEGMENT
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logEdit, should
not forceSync
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Start LogSync
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logSyncAll, myTransactionId
1287
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When logSync was called
: txid = 1287
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When logSync was called
: myTransationId = 1287
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - When logSync was called
: synctxid = 830
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logSync, start
flush
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - when logSync, finish
flush
2019-04-19 16:32:08,068 INFO namenode.FSEditLog TransactionThread-267 - Number of transactions:
459 Total time for transactions(ms): 9 Number of transactions batched in Syncs: 280 Number
of syncs: 4 SyncTimes(ms): 0 3
2019-04-19 16:32:08,068 INFO namenode.FileJournalManager TransactionThread-267 - Finalizing
edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000829
-> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000829-0000000000000001287
2019-04-19 16:32:08,069 INFO namenode.FileJournalManager TransactionThread-267 - Finalizing
edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000829
-> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000829-0000000000000001287
2019-04-19 16:32:08,069 INFO namenode.FSEditLog TransactionThread-267 - Finish Ending log
segment
20
{code}
we can see that the sequence of this three Id is Ok . 

 

 

The Error #rollEditLog Case: see 
{code:java}
TransactionThread-226 {code}
 
{code:java}
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called
: txid = 760
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called
: myTransationId = 19
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called
: synctxid = 23
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Rolling edit logs
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Ending log segment
3
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When endCurrentLogSegment
was called : txid = 760
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When endCurrentLogSegment
was called : myTransationId = 19
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - When endCurrentLogSegment
was called : synctxid = 23
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when logEdit, OpCode
OP_END_LOG_SEGMENT
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - Start LogSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-226 - when logSyncAll, myTransactionId
761
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, OpCode
OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, OpCode
OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, OpCode
OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-228 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-228 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, OpCode
OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, OpCode
OP_ADD
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, should
not forceSync
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,978 INFO namenode.FSEditLog TransactionThread-261 - when logEdit, should
not forceSync
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-229 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-229 - when logEdit, should
not forceSync
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-260 - when logEdit, OpCode
OP_ADD
2019-04-19 16:32:07,979 INFO namenode.FSEditLog TransactionThread-260 - when logEdit, should
not forceSync

................
..................
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-260 - when logEdit, should
not forceSync
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-262 - when logEdit, should
not forceSync
2019-04-19 16:32:07,982 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-263 - when logEdit, should
not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-264 - when logEdit, should
not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-265 - when logEdit, should
not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-266 - when logEdit, OpCode
OP_CLOSE
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-266 - when logEdit, should
not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Rolling edit logs
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Ending log segment
3
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When endCurrentLogSegment
was called : txid = 827
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When endCurrentLogSegment
was called : myTransationId = 758
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When endCurrentLogSegment
was called : synctxid = 758
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, OpCode
OP_END_LOG_SEGMENT
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, should
not forceSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Start LogSync
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logSyncAll, myTransactionId
828
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called
: txid = 828
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called
: myTransationId = 828
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called
: synctxid = 758
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logSync, start
flush
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - when logSync, finish
flush
2019-04-19 16:32:07,983 INFO namenode.FSEditLog TransactionThread-225 - Number of transactions:
826 Total time for transactions(ms): 55 Number of transactions batched in Syncs: 1 Number
of syncs: 5 SyncTimes(ms): 1 2
2019-04-19 16:32:07,984 INFO namenode.FileJournalManager TransactionThread-225 - Finalizing
edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000000003
-> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_0000000000000000003-0000000000000000828
2019-04-19 16:32:07,984 INFO namenode.FileJournalManager TransactionThread-225 - Finalizing
edits file /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_inprogress_0000000000000000003
-> /Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name2/current/edits_0000000000000000003-0000000000000000828
2019-04-19 16:32:07,984 INFO namenode.FSEditLog TransactionThread-225 - Finish Ending log
segment
2019-04-19 16:32:07,984 INFO namenode.FSEditLog TransactionThread-225 - Starting log segment
at 829
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, OpCode
OP_START_LOG_SEGMENT
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - when logEdit, should
not forceSync
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called
: txid = 829
2019-04-19 16:32:07,985 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called
: myTransationId = 829
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - When logSync was called
: synctxid = 828
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - when logSync, start
flush
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - when logSync, finish
flush
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-225 - Finish Rolling edit
logs
2019-04-19 16:32:07,986 INFO namenode.FSEditLog TransactionThread-267 - when logEdit, OpCode
OP_CLOSE
2019-04-
.................
................


2019-04-19 16:32:08,080 INFO namenode.FSEditLog TransactionThread-227 - When endCurrentLogSegment
was called : txid = 1328
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - When endCurrentLogSegment
was called : myTransationId = 764
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - When endCurrentLogSegment
was called : synctxid = 1288
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, OpCode
OP_END_LOG_SEGMENT
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when logEdit, should
not forceSync
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - Start LogSync
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-227 - when logSyncAll, myTransactionId
1329
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called
: txid = 1329
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called
: myTransationId = 761
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - When logSync was called
: synctxid = 1288
2019-04-19 16:32:08,081 INFO namenode.FSEditLog TransactionThread-226 - Number of transactions:
42 Total time for transactions(ms): 2 Number of transactions batched in Syncs: 41 Number of
syncs: 3 SyncTimes(ms): 1 1
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - The edits
buffer is 663 bytes long with 39 unflushed transactions. Below is the list of unflushed transactions:
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [0]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1291]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [1]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1292]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [2]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1293]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [3]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1294]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [4]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1295]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [5]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1296]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [6]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1297]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [7]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1298]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [8]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1299]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [9]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1300]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [10]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1301]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [11]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1302]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [12]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1303]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [13]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1304]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [14]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1305]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [15]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1306]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [16]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1307]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [17]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1308]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [18]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1309]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [19]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1310]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [20]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1311]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [21]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1312]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [22]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1313]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [23]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1314]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [24]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1315]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [25]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1316]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [26]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1317]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [27]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1318]
2019-04-19 16:32:08,084 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [28]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1319]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [29]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1320]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [30]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1321]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [31]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1322]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [32]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1323]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [33]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1324]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [34]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1325]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [35]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1326]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [36]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1327]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [37]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1328]
2019-04-19 16:32:08,085 WARN namenode.EditsDoubleBuffer TransactionThread-226 - Unflushed
op [38]: LogSegmentOp [opCode=OP_END_LOG_SEGMENT, txid=1329]
2019-04-19 16:32:08,085 ERROR namenode.FSEditLog TransactionThread-226 - Error: finalize log
segment 1288, 1329 failed for (journal JournalAndStream(mgr=FileJournalManager(root=/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1),
stream=EditLogFileOutputStream(/Users/angerszhu/Documents/project/gdc/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/name1/current/edits_inprogress_0000000000000001288)))
java.io.IOException: FSEditStream has 663 bytes still to be flushed and cannot be closed.
at org.apache.hadoop.hdfs.server.namenode.EditsDoubleBuffer.close(EditsDoubleBuffer.java:73)
at org.apache.hadoop.hdfs.server.namenode.EditLogFileOutputStream.close(EditLogFileOutputStream.java:151)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalAndStream.closeStream(JournalSet.java:115)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$4.apply(JournalSet.java:235)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.finalizeLogSegment(JournalSet.java:231)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.endCurrentLogSegment(FSEditLog.java:1259)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.rollEditLog(FSEditLog.java:1182)
at org.apache.hadoop.hdfs.server.namenode.TestEditLog$Transactions.run(TestEditLog.java:223)
at java.lang.Thread.run(Thread.java:748)
{code}
For the TransactionThread-226, we can see that it's interrupted, 

During the whole cycle of  #rollEditLog, it was been interrupted by other thread's evnent.

And this make the #syncTxId will be larger then the old thread that call #rollEditLog and
been interrupted.

Finally because: myTxid <= synctxid, it just returned here.

 
{code:java}
if (mytxid <= synctxid) {
  numTransactionsBatchedInSync++;
  if (metrics != null) {
    // Metrics is non-null only when used inside name node
    metrics.incrTransactionsBatchedInSync();
  }
  return;
}

{code}
But when this time. bufCurrent will still have data not been flushed.

Then cause that error.

 

 

But I can't say why it's interrupted since there are many way's will call logSync and I don't
know more detail of the whole HA and QJM. 

 

 

> Exception happened when   rollEditLog expects empty EditsDoubleBuffer.bufCurrent  but
not
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14437
>                 URL: https://issues.apache.org/jira/browse/HDFS-14437
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode, qjm
>            Reporter: angerszhu
>            Priority: Major
>
> For the problem mentioned in https://issues.apache.org/jira/browse/HDFS-10943 , I have
sort the process of write and flush EditLog and some important function, I found the in the
class  FSEditLog class, the close() function will call such process like below:
>  
> {code:java}
> waitForSyncToFinish();
> endCurrentLogSegment(true);{code}
> since we have gain the object lock in the function close(), so when  waitForSyncToFish()
method return, it mean all logSync job has done and all data in bufReady has been flushed
out, and since current thread has the lock of this object, when call endCurrentLogSegment(),
no other thread will gain the lock so they can't write new editlog into currentBuf.
> But when we don't call waitForSyncToFish() before endCurrentLogSegment(), there may be
some autoScheduled logSync()'s flush process is doing, since this process don't need
> synchronization since it has mention in the comment of logSync() method :
>  
> {code:java}
> /**
>  * Sync all modifications done by this thread.
>  *
>  * The internal concurrency design of this class is as follows:
>  *   - Log items are written synchronized into an in-memory buffer,
>  *     and each assigned a transaction ID.
>  *   - When a thread (client) would like to sync all of its edits, logSync()
>  *     uses a ThreadLocal transaction ID to determine what edit number must
>  *     be synced to.
>  *   - The isSyncRunning volatile boolean tracks whether a sync is currently
>  *     under progress.
>  *
>  * The data is double-buffered within each edit log implementation so that
>  * in-memory writing can occur in parallel with the on-disk writing.
>  *
>  * Each sync occurs in three steps:
>  *   1. synchronized, it swaps the double buffer and sets the isSyncRunning
>  *      flag.
>  *   2. unsynchronized, it flushes the data to storage
>  *   3. synchronized, it resets the flag and notifies anyone waiting on the
>  *      sync.
>  *
>  * The lack of synchronization on step 2 allows other threads to continue
>  * to write into the memory buffer while the sync is in progress.
>  * Because this step is unsynchronized, actions that need to avoid
>  * concurrency with sync() should be synchronized and also call
>  * waitForSyncToFinish() before assuming they are running alone.
>  */
> public void logSync() {
>   long syncStart = 0;
>   // Fetch the transactionId of this thread. 
>   long mytxid = myTransactionId.get().txid;
>   
>   boolean sync = false;
>   try {
>     EditLogOutputStream logStream = null;
>     synchronized (this) {
>       try {
>         printStatistics(false);
>         // if somebody is already syncing, then wait
>         while (mytxid > synctxid && isSyncRunning) {
>           try {
>             wait(1000);
>           } catch (InterruptedException ie) {
>           }
>         }
>         //
>         // If this transaction was already flushed, then nothing to do
>         //
>         if (mytxid <= synctxid) {
>           numTransactionsBatchedInSync++;
>           if (metrics != null) {
>             // Metrics is non-null only when used inside name node
>             metrics.incrTransactionsBatchedInSync();
>           }
>           return;
>         }
>    
>         // now, this thread will do the sync
>         syncStart = txid;
>         isSyncRunning = true;
>         sync = true;
>         // swap buffers
>         try {
>           if (journalSet.isEmpty()) {
>             throw new IOException("No journals available to flush");
>           }
>           editLogStream.setReadyToFlush();
>         } catch (IOException e) {
>           final String msg =
>               "Could not sync enough journals to persistent storage " +
>               "due to " + e.getMessage() + ". " +
>               "Unsynced transactions: " + (txid - synctxid);
>           LOG.fatal(msg, new Exception());
>           synchronized(journalSetLock) {
>             IOUtils.cleanup(LOG, journalSet);
>           }
>           terminate(1, msg);
>         }
>       } finally {
>         // Prevent RuntimeException from blocking other log edit write 
>         doneWithAutoSyncScheduling();
>       }
>       //editLogStream may become null,
>       //so store a local variable for flush.
>       logStream = editLogStream;
>     }
>     
>     // do the sync
>     long start = now();
>     try {
>       if (logStream != null) {
>         logStream.flush();
>       }
>     } catch (IOException ex) {
>       synchronized (this) {
>         final String msg =
>             "Could not sync enough journals to persistent storage. "
>             + "Unsynced transactions: " + (txid - synctxid);
>         LOG.fatal(msg, new Exception());
>         synchronized(journalSetLock) {
>           IOUtils.cleanup(LOG, journalSet);
>         }
>         terminate(1, msg);
>       }
>     }
>     long elapsed = now() - start;
>     if (metrics != null) { // Metrics non-null only when used inside name node
>       metrics.addSync(elapsed);
>     }
>     
>   } finally {
>     // Prevent RuntimeException from blocking other log edit sync 
>     synchronized (this) {
>       if (sync) {
>         synctxid = syncStart;
>         for (JournalManager jm : journalSet.getJournalManagers()) {
>           /**
>            * {@link FileJournalManager#lastReadableTxId} is only meaningful
>            * for file-based journals. Therefore the interface is not added to
>            * other types of {@link JournalManager}.
>            */
>           if (jm instanceof FileJournalManager) {
>             ((FileJournalManager)jm).setLastReadableTxId(syncStart);
>           }
>         }
>         isSyncRunning = false;
>       }
>       this.notifyAll();
>    }
>   }
> }
> {code}
> So  when not call waitForSyncFinish() before endCurrentLogSegment() will came to a situation
that when it can't guarantee that when call endCurrentLogSegment() method, there is no flush
job was doing. Then in the endCurrentLogSegment() method process , bufReady may haven't been
flushed out totally, then it swap with the bufCurrent, finally when call EditLogOutputStream's
close() function, there is still un-flushed bytes in bufCurrent then cause the error in 
> https://issues.apache.org/jira/browse/HDFS-10943
>  
> so maybe we should add waitForSyncFinish() before endCurrentLogSegment() method in rollEditLog()
method in FSEditLog class ?
> {code:java}
> synchronized long rollEditLog() throws IOException {
>  LOG.info("Rolling edit logs");
>  waitForSyncToFinish();
>  endCurrentLogSegment(true);
>  
>  long nextTxId = getLastWrittenTxId() + 1;
>  startLogSegment(nextTxId, true);
>  
>  assert curSegmentTxId == nextTxId;
>  return nextTxId;
> }{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message