hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kai Zheng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9879) Erasure Coding : schedule striped blocks to be cached on DataNodes
Date Thu, 03 Mar 2016 16:09:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15178017#comment-15178017
] 

Kai Zheng commented on HDFS-9879:
---------------------------------

The problem is, HDFS cache is initially designed for replicated blocks, caching striped blocks
on datanode may not make so much sense, or think about in what cases these cached blocks can
be useful. If a striped block sounds still good to cache, we probably should ensure all the
other data blocks in the group to be cached as well, which is kinds of complicated and needs
to justify with right use cases. I would suggest for the simple fix in the mentioned code
in {{CacheReplicationMonitor}}, add condition to skip the business for striped blocks, or
avoid entering to it at the first beginning for striped files. [~andrew.wang], I'm not very
sure about this, would you help clarify or correct? Thanks!

> Erasure Coding : schedule striped blocks to be cached on DataNodes
> ------------------------------------------------------------------
>
>                 Key: HDFS-9879
>                 URL: https://issues.apache.org/jira/browse/HDFS-9879
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Rakesh R
>            Assignee: Rakesh R
>
> This jira to discuss and implement the caching of striped block objects on the appropriate
datanode.
> Presently it is checking block group size and scheduling the blockGroupId to the datanode,
which needs to be refined by checking the {{StripedBlockUtil.getInternalBlockLength()}} and
schedule proper blockId to the datanode.
> {code}
> CacheReplicationMonitor.java
>       if (pendingCapacity < blockInfo.getNumBytes()) {
>         LOG.trace("Block {}: DataNode {} is not a valid possibility " +
>             "because the block has size {}, but the DataNode only has {} " +
>             "bytes of cache remaining ({} pending bytes, {} already cached.)",
>             blockInfo.getBlockId(), datanode.getDatanodeUuid(),
>             blockInfo.getNumBytes(), pendingCapacity, pendingBytes,
>             datanode.getCacheRemaining());
>         outOfCapacity++;
>         continue;
>       }
>     for (DatanodeDescriptor datanode : chosen) {
>       LOG.trace("Block {}: added to PENDING_CACHED on DataNode {}",
>           blockInfo.getBlockId(), datanode.getDatanodeUuid());
>       pendingCached.add(datanode);
>       boolean added = datanode.getPendingCached().add(cachedBlock);
>       assert added;
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message