hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10942) Incorrect handling of flushing edit logs to JN
Date Fri, 30 Sep 2016 23:24:20 GMT
Yongjun Zhang created HDFS-10942:

             Summary: Incorrect handling of flushing edit logs to JN
                 Key: HDFS-10942
                 URL: https://issues.apache.org/jira/browse/HDFS-10942
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: s, hdfs
            Reporter: Yongjun Zhang

We use EditsDoubleBuffer to handle edit logs:
 * A double-buffer for edits. New edits are written into the first buffer
 * while the second is available to be flushed. Each time the double-buffer
 * is flushed, the two internal buffers are swapped. This allows edits
 * to progress concurrently to flushes without allocating new buffers each
 * time.

With the following code, that flush the ready buffer, it copy the ready buffer to a local
copy, then flush.


QuarumOutputStream (buf in the code below is an instance of EditsDoubleBuffer):

  protected void flushAndSync(boolean durable) throws IOException {
    int numReadyBytes = buf.countReadyBytes();
    if (numReadyBytes > 0) {
      int numReadyTxns = buf.countReadyTxns();
      long firstTxToFlush = buf.getFirstReadyTxId();

      assert numReadyTxns > 0;

      // Copy from our double-buffer into a new byte array. This is for
      // two reasons:
      // 1) The IPC code has no way of specifying to send only a slice of
      //    a larger array.
      // 2) because the calls to the underlying nodes are asynchronous, we
      //    need a defensive copy to avoid accidentally mutating the buffer
      //    before it is sent.
      DataOutputBuffer bufToSend = new DataOutputBuffer(numReadyBytes);
      assert bufToSend.getLength() == numReadyBytes;
      byte[] data = bufToSend.getData();
      assert data.length == bufToSend.getLength();

 The above call doesn't seem to prevent the orginal copy of the buffer inside buf from being
swapped by  the following method. 


 public void setReadyToFlush() {
    assert isFlushed() : "previous data not flushed yet";
    TxnBuffer tmp = bufReady;
    bufReady = bufCurrent;
    bufCurrent = tmp;


Though we have some runtime assertion in the code, the assertion is not enabled in production,
so the expected condition in the assert may be false at runtime. This would possibly cause
a mess.  When any condition is not as expected by the assertion, it seems a real exception
should be thrown instead.

So two issues in summary:
- How we synchronize between the flush and the swap of the two buffers
- Whether we should throw real exception instead of the assert that's disabled at runtime

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message