hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From László Bence Nagy (JIRA) <j...@apache.org>
Subject [jira] [Created] (HDFS-11542) Fix RawErasureCoderBenchmark decoding operation
Date Fri, 17 Mar 2017 14:44:41 GMT
László Bence Nagy created HDFS-11542:
----------------------------------------

             Summary: Fix RawErasureCoderBenchmark decoding operation
                 Key: HDFS-11542
                 URL: https://issues.apache.org/jira/browse/HDFS-11542
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: erasure-coding
    Affects Versions: 3.0.0-alpha2
            Reporter: László Bence Nagy
            Priority: Minor


There are some issues with the decode operation in the *RawErasureCoderBenchmark.java* file.
The decoding method is called like this: *decoder.decode(decodeInputs, ERASED_INDEXES, outputs);*.


Using RS 6+3 configuration it could be called with these parameters correctly like this: *decode([
d0, NULL, d2, d3, NULL, d5, p0, NULL, p2 ], [ 1, 4, 7 ], [ -, -, - ])*. The 1,4,7 indexes
are in the *ERASED_INDEXES* array so in the *decodeInputs* array the values at those indexes
are set to NULL, all other data and parity packets are present in the array. The *outputs*
array's length is 3, where the d1, d4 and p1 packets should be reconstructed. This would be
the right solution.

Right now this example would be called like this: *decode([ d0, d1, d2, d3, d4, d5, -, -,
- ], [ 1, 4, 7 ], [ -, -, - ])*. So it has two main problems with the *decodeInputs* array.
Firstly, the packets are not set to NULL where they should be based on the *ERASED_INDEXES*
array. Secondly, it does not have any parity packets for decoding.

The first problem is easy to solve, the values at the proper indexes need to be set to NULL.
The latter one is a little harder because right now multiple rounds of encode operations are
done one after another and similarly multiple decode operations are called one by one. Encode
and decode pairs should be called one after another so that the encoded parity packets can
be used in the *decodeInputs* array as a parameter for decode. (Of course, their performance
should be still measured separately.)

Moreover, there is one more problem in this file. Right now it works with RS 6+3 and the *ERASED_INDEXES*
array is fixed to *[ 6, 7, 8 ]*. So the three parity packets are needed to be reconstructed.
This means that no real decode performance is measured because no data packet is needed to
be reconstructed (even if the decode works properly). Actually, only new parity packets are
needed to be encoded. The exact implementation depends on the underlying erasure coding plugin,
but the point is that data packets should also be erased to measure real decode performance.

In addition to this, more RS configurations (not just 6+3) could be measured as well to be
able to compare them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message