hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11938) Fix ByteBuffer version encode/decode API of raw erasure coder
Date Thu, 14 May 2015 02:12:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543076#comment-14543076
] 

Yi Liu commented on HADOOP-11938:
---------------------------------

Looks more better than original.

Some lines are longer than 80 chars.

*In AbstractRawErasureCoder.java*

{code}
+    for (int i = pos; i < buffer.remaining(); ++i) {
+      buffer.put(i, (byte) 0);
      }
{code}
it should be {{buffer.limit()}} instead of remaining
And we can just use {{buffer.put((byte)0)}}

{code}
@return the buffer itself, with ZERO bytes written, remaining the original
+   * position
{code}
"remaining the original position",  maybe "the position and limit is not changed after the
call" is more clear.

use {{HadoopIllegalArgumentException}} instead of {{IllegalArgumentException}}

*in XORRawDecoder.java and XORRawEncoder.java*
{code}
inputs[i].position() + inputs[0].remaining()
{code}
Just need use inputs\[i\].limit()

*in RSRawDecoder.java and RSRawEncoder.java*
{code}
+    int dataLen = inputs[0].remaining();
{code}
is it necessary?  I think we don't need to pass {{dataLen}} to {{RSUtil.GF.solveVandermondeSystem}}

*in GaloisField.java*
{code}
public void solveVandermondeSystem(int[] x, ByteBuffer[] y,
                                      int len, int dataLen) {
{code}
As in previous comment, {{dataLen}} is unnecessary, so "idx1 < p.position() + dataLen"
can be "idx1 < p.limit()"


{code}
  public void substitute(ByteBuffer[] p, ByteBuffer q, int x) {
-    int y = 1;
+    int y = 1, iIdx, oIdx;
+    int len = p[0].remaining();
     for (int i = 0; i < p.length; i++) {
       ByteBuffer pi = p[i];
-      int len = pi.remaining();
-      for (int j = 0; j < len; j++) {
-        int pij = pi.get(j) & 0x000000FF;
-        q.put(j, (byte) (q.get(j) ^ mulTable[pij][y]));
+      for (iIdx = pi.position(), oIdx = q.position();
+           iIdx < pi.position() + len; iIdx++, oIdx++) {
+        int pij = pi.get(iIdx) & 0x000000FF;
+        q.put(oIdx, (byte) (q.get(oIdx) ^ mulTable[pij][y]));
       }
       y = mulTable[x][y];
     }
{code}
{{len}} is unnecessary.

Same for
{code}
public void remainder(ByteBuffer[] dividend, int len, int[] divisor) {
{code}

*in TestCoderBase.java*

{code}
+  private byte[] zeroChunkBytes;
..............

   protected void eraseDataFromChunk(ECChunk chunk) {
     ByteBuffer chunkBuffer = chunk.getBuffer();
-    // erase the data
-    chunkBuffer.position(0);
-    for (int i = 0; i < chunkSize; i++) {
-      chunkBuffer.put((byte) 0);
-    }
+    // erase the data at the position, and restore the buffer ready for reading
+    // chunkSize bytes but all ZERO.
+    int pos = chunkBuffer.position();
+    chunkBuffer.flip();
+    chunkBuffer.position(pos);
+    chunkBuffer.limit(pos + chunkSize);
+    chunkBuffer.put(zeroChunkBytes);
     chunkBuffer.flip();
+    chunkBuffer.position(pos);
+    chunkBuffer.limit(pos + chunkSize);
{code}

{code}
-  protected static ECChunk cloneChunkWithData(ECChunk chunk) {
+  protected ECChunk cloneChunkWithData(ECChunk chunk) {
     ByteBuffer srcBuffer = chunk.getBuffer();
-    ByteBuffer destBuffer;
+    ByteBuffer destBuffer = allocateOutputChunkBuffer();
  
-    byte[] bytesArr = new byte[srcBuffer.remaining()];
+    byte[] bytesArr = new byte[chunkSize];
     srcBuffer.mark();
     srcBuffer.get(bytesArr);
     srcBuffer.reset();
  
-    if (srcBuffer.hasArray()) {
-      destBuffer = ByteBuffer.wrap(bytesArr);
-    } else {
-      destBuffer = ByteBuffer.allocateDirect(srcBuffer.remaining());
-      destBuffer.put(bytesArr);
-      destBuffer.flip();
-    }
+    int pos = destBuffer.position();
+    destBuffer.put(bytesArr);
+    destBuffer.flip();
+    destBuffer.position(pos);
{code}

{{destBuffer}} is still assumed to be chunkSize
Furthermore, some unnecessary flip


{code}
+  /**
+   * Convert an array of this chunks to an array of byte array.
+   * Note the chunk buffers are not affected.
+   * @param chunks
+   * @return an array of byte array
+   */
+  protected byte[][] toArrays(ECChunk[] chunks) {
+    byte[][] bytesArr = new byte[chunks.length][];
+
+    ByteBuffer buffer;
+    for (int i = 0; i < chunks.length; i++) {
+      buffer = chunks[i].getBuffer();
+      if (buffer.hasArray() && buffer.position() == 0 &&
+          buffer.remaining() == chunkSize) {
+        bytesArr[i] = buffer.array();
+      } else {
+        bytesArr[i] = new byte[buffer.remaining()];
+        // Avoid affecting the original one
+        buffer.mark();
+        buffer.get(bytesArr[i]);
+        buffer.reset();
+      }
+    }
+
+    return bytesArr;
+  }
{code}
We already have this method, use {{ECChunk.toBuffers}}  ?  Convert to bytebuffer is enough?
If not, should we have this method in code not only in test?


*In TestRawCoderBase.java*
You should add more description about your tests, for example, the negative test is for what
and how you test,  you also need find a good name for it.
{code}
protected void testCodingNegative(boolean usingDirectBuffer) {
+    this.usingDirectBuffer = usingDirectBuffer;
+    prepareCoders();
+
+    boolean isOK1;
+    try {
+      performTestCoding(baseChunkSize, true, false);
+      isOK1 = false;
+    } catch (Exception e) {
+      isOK1 = true;
+    }
+
+    boolean isOK2;
+    try {
+      performTestCoding(baseChunkSize, false, true);
+      isOK2 = false;
+    } catch (Exception e) {
+      isOK2 = true;
+    }
+
+    Assert.assertTrue("Negative tests passed", isOK1 && isOK2);
+  }
{code}
It should something like following, you can refer to other tests.
{code}
try {
  performTestCoding(baseChunkSize, true, false);
  Assert.fail("")  <-  your description here
} catch (Exception e) {
  // expected        <-  you can also choose to compare the exception msg
}
{code}


> Fix ByteBuffer version encode/decode API of raw erasure coder
> -------------------------------------------------------------
>
>                 Key: HADOOP-11938
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11938
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: io
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-11938-HDFS-7285-v1.patch, HADOOP-11938-HDFS-7285-v2.patch,
HADOOP-11938-HDFS-7285-workaround.patch
>
>
> While investigating a test failure in {{TestRecoverStripedFile}}, one issue in raw erasrue
coder, caused by an optimization in below codes. It assumes the  heap buffer backed by the
bytes array available for reading or writing always starts with zero and takes the whole space.
> {code}
>   protected static byte[][] toArrays(ByteBuffer[] buffers) {
>     byte[][] bytesArr = new byte[buffers.length][];
>     ByteBuffer buffer;
>     for (int i = 0; i < buffers.length; i++) {
>       buffer = buffers[i];
>       if (buffer == null) {
>         bytesArr[i] = null;
>         continue;
>       }
>       if (buffer.hasArray()) {
>         bytesArr[i] = buffer.array();
>       } else {
>         throw new IllegalArgumentException("Invalid ByteBuffer passed, " +
>             "expecting heap buffer");
>       }
>     }
>     return bytesArr;
>   }
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message