hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Mo (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14740) HDFS read cache persistence support
Date Fri, 13 Sep 2019 08:30:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929046#comment-16929046

Rui Mo commented on HDFS-14740:

Thanks [~rakeshr] for reviewing the patch and the valuable comments.

In [^HDFS-14740.004.patch] :
{quote}1. Please remove duplicate checks in #restoreCache() method as you already doing the
checks inside #createBlockPoolDir().
The duplicate checks has been removed.
{quote}2. {{pmemVolume/BlockPoolId/BlockPoolId-BlockId}}. {{BlockPoolId}} is duplicated.
The file is named as BlockId for simplicity.
{quote}3. Can you explore the chances of using hierarchical way of storing blocks similar
to the existing datanode data.dir, this is to avoid chances of growing blocks under one single
blockPoolId. Assume cache capacity in TBs and large set of data blocks in cache under a blockPool.
Please refer {{DatanodeUtil.idToBlockDir(finalizedDir, b.getBlockId());}}
We {{use}} hierarchical way of cache storage referring to the implementation in DatanodeUtil,
so as to avoid storing large amount of blocks under one single BlockPoolId.
{quote}{{4.restoreCache()}} - How about moving specific parsing/restore logic to respective
MappableBlockLoaders. PmemMappableBlockLoader#restoreCache() and NativePmemMappableBlockLoader#restoreCache().
We have refactored this part of implementation. restoreCache() remains in PmemVolumeManger
to restore some variables, but it calls specific parsing/{color:#172b4d}restore logic in respective
{quote}{color:#172b4d}5. {{dfs.datanode.cache.persistence.enabled}} - by default this can
be true as this will allow to get maximum capabilities of pmem device. Overall the feature
is disabled and default value of "dfs.datanode.cache.pmem.dirs" is empty and will be DRAM
based. So, once the user enables pmem, they can utilize the potential of this device and no
case of compatibility.{color}
 {color:#172b4d}{{dfs.datanode.cache.persistence.enabled}}{color} is true by default now.
The user can enable pmem by configuring{color:#172b4d}"dfs.datanode.cache.pmem.dirs".{color}


> HDFS read cache persistence support
> -----------------------------------
>                 Key: HDFS-14740
>                 URL: https://issues.apache.org/jira/browse/HDFS-14740
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Feilong He
>            Assignee: Rui Mo
>            Priority: Major
>         Attachments: HDFS-14740.000.patch, HDFS-14740.001.patch, HDFS-14740.002.patch,
HDFS-14740.003.patch, HDFS-14740.004.patch
> In HDFS-13762, persistent memory (PM) is enabled in HDFS centralized cache management.
Even though PM can persist cache data, for simplifying the initial implementation, the previous
cache data will be cleaned up during DataNode restarts. Here, we are proposing to improve
HDFS PM cache by taking advantage of PM's data persistence characteristic, i.e., recovering
the cache status when DataNode restarts, thus, cache warm up time can be saved for user.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message