hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11193) [SPS]: Erasure coded files should be considered for satisfying storage policy
Date Tue, 13 Dec 2016 03:26:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744015#comment-15744015
] 

Rakesh R commented on HDFS-11193:
---------------------------------

Thanks [~umamaheswararao] for the useful review comments. Following are the changes in the
new patch, kindly take another look at the latest patch.

* I've fixed 1,2 comments.
* While testing I found an issue in {{when there is no target node with the required storage
type}} logic. For example, I have a block with locations A(disk), B(disk), C(disk) and assume
only A, B and C are live nodes with A & C have archive storage type. Again assume, user
changed the storage policy to {{COLD}}. Now, SPS internally starts preparing the src-target
pairing like, {{src=> (A, B, C) and target=> (A, C)}}. Its skipping B as it doesn't
have archive media and this is an indication that SPS should do retries for satisfying all
of its block locations. On the other side, coordinator will pair the src-target node for actual
physical movement like, {{movetask=> (A, A), (B, C)}}. Here ideally it should do (C, C)
instead of (B, C) but mistakenly choosing the source C. I think, the implicit assumptions
of retry needed will create confusions and coding mistakes like this. In this patch, I've
created a new flag {{retryNeeded}} flag to make it more readable. Now, SPS will prepare only
the matching pair and dummy source slots will be avoided like, {{src=> (A, C) and target=>
(A, C)}} and set retryNeeded=true to convey the message that this trackId has only partial
blocks movements.
* Added one more test for ec striped block.

bq. One another idea in my mind is that, how about just including blockIndexes in the case
of Striped?
Thanks for this idea. Following is my analysis on this approach. As we know, presently NN
is passing simple {{Block}} objects to the coordinator datanode for movement. Inorder to do
the internal block constrcution at the DN side, it requires the BlockInfoStriped complex object
and the blockIndices array. I think passing list of simple object is better compare to the
complex object, this will keep all the computation complexities at the SPS side and makes
the coordinator logic more readable. I'd prefer to keep the internal block constrcution logic
at the NN side. Does this make sense to you?
{code}
+            // construct internal block
+            long blockId = blockInfo.getBlockId() + si.getBlockIndex();
+            long numBytes = StripedBlockUtil.getInternalBlockLength(
+                sBlockInfo.getNumBytes(), sBlockInfo.getCellSize(),
+                sBlockInfo.getDataBlockNum(), si.getBlockIndex());
+            Block blk = new Block(blockId, numBytes,
+                blockInfo.getGenerationStamp());
{code}

> [SPS]: Erasure coded files should be considered for satisfying storage policy
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-11193
>                 URL: https://issues.apache.org/jira/browse/HDFS-11193
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>         Attachments: HDFS-11193-HDFS-10285-00.patch, HDFS-11193-HDFS-10285-01.patch,
HDFS-11193-HDFS-10285-02.patch
>
>
> Erasure coded striped files supports storage policies {{HOT, COLD, ALLSSD}}. {{HdfsAdmin#satisfyStoragePolicy}}
API call on a directory should consider all immediate files under that directory and need
to check that, the files really matching with namespace storage policy. All the mismatched
striped blocks should be chosen for block movement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message