hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5053) NameNode should invoke DataNode APIs to coordinate caching
Date Fri, 13 Sep 2013 22:14:52 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Wang updated HDFS-5053:

    Attachment: hdfs-5053-2.patch

Thanks for the reviews, Colin and Todd. Lots of good comments.


* Restored the Fallible new IOE, agree it makes sense
* Renamed the config option to "dfs.namenode.caching.enabled". I think one option makes sense
here since the big toggle is the creation of the {{cachedBlocksMap}}, which we'll need for
both LRU and path-based caching.
* I added the default case since it'll help catch when new block command types are added (e.g.
what I did previously and had to debug here).
* Caching factor sounds like follow-on work :) This will mean linking {{BlockCollection}}
to the {{PCE}} like you suggested, so I think we can do both in the same follow-on JIRA.


* Added/tweaked javadocs and annotations as requested.
* Re: BlockManager synchronization, AFAICT it just uses the namesystem lock, which is what
we do here as well.
* Agree that UnderReplicatedBlocks is not a good fit, replaced with a LightWeightHashSet and
synchronized it.
* Refactored InvalidateBlocks to have UncacheBlocks and InvalidateStoredBlocks subclasses
with only the relevant methods. Hopefully looks cleaner now.
* I agree that having a BlockInfo in the wrong map is scary. I guess I could just subclass
a {{CachedBlockInfo}} and do a type check? If it's any solace, we only add cached blocks during
cache report processing, and it's always a new BlockInfo or something coming out of the cachedBlocksMap.
* Re: safemode, I think it's okay to be sloppy and just drop cache reports received in startup
safe mode. Left the TODO in since we might want to do this eventually.
* Re: standby misreplicated blocks, I think the sloppiest we can do is just ignore these mystery
blocks when we see them (the current behavior). I don't think we can be much sloppier since
there are assumptions that a cached block is also present in the blocksMap and has a related
* Re: uncaching a block not present in the blocksMap, this really shouldn't happen, removed
uncaching it for clarity.
* Re: extracting base class, I did this with {{ReportProcessor}}. This has most of the common
block report/cache report computation, with subclass hooks for processing the actions of the
* Re: INodeFile memory usage, I'd like to punt this one out since the subclassing process
sounds painful. As Colin mentioned, the repl factor is eventually going to be in the {{PathCacheEntry}},
so we need to store a reference to the INode's PCEs.
> NameNode should invoke DataNode APIs to coordinate caching
> ----------------------------------------------------------
>                 Key: HDFS-5053
>                 URL: https://issues.apache.org/jira/browse/HDFS-5053
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Wang
>         Attachments: hdfs-5053-1.patch, hdfs-5053-2.patch
> The NameNode should invoke the DataNode APIs to coordinate caching.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message