hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-5053) NameNode should invoke DataNode APIs to coordinate caching
Date Fri, 13 Sep 2013 22:14:52 GMT

     [ https://issues.apache.org/jira/browse/HDFS-5053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrew Wang updated HDFS-5053:
------------------------------

    Attachment: hdfs-5053-2.patch

Thanks for the reviews, Colin and Todd. Lots of good comments.

Colin:

* Restored the Fallible new IOE, agree it makes sense
* Renamed the config option to "dfs.namenode.caching.enabled". I think one option makes sense
here since the big toggle is the creation of the {{cachedBlocksMap}}, which we'll need for
both LRU and path-based caching.
* I added the default case since it'll help catch when new block command types are added (e.g.
what I did previously and had to debug here).
* Caching factor sounds like follow-on work :) This will mean linking {{BlockCollection}}
to the {{PCE}} like you suggested, so I think we can do both in the same follow-on JIRA.

Todd:

* Added/tweaked javadocs and annotations as requested.
* Re: BlockManager synchronization, AFAICT it just uses the namesystem lock, which is what
we do here as well.
* Agree that UnderReplicatedBlocks is not a good fit, replaced with a LightWeightHashSet and
synchronized it.
* Refactored InvalidateBlocks to have UncacheBlocks and InvalidateStoredBlocks subclasses
with only the relevant methods. Hopefully looks cleaner now.
* I agree that having a BlockInfo in the wrong map is scary. I guess I could just subclass
a {{CachedBlockInfo}} and do a type check? If it's any solace, we only add cached blocks during
cache report processing, and it's always a new BlockInfo or something coming out of the cachedBlocksMap.
* Re: safemode, I think it's okay to be sloppy and just drop cache reports received in startup
safe mode. Left the TODO in since we might want to do this eventually.
* Re: standby misreplicated blocks, I think the sloppiest we can do is just ignore these mystery
blocks when we see them (the current behavior). I don't think we can be much sloppier since
there are assumptions that a cached block is also present in the blocksMap and has a related
BlockCollection.
* Re: uncaching a block not present in the blocksMap, this really shouldn't happen, removed
uncaching it for clarity.
* Re: extracting base class, I did this with {{ReportProcessor}}. This has most of the common
block report/cache report computation, with subclass hooks for processing the actions of the
report.
* Re: INodeFile memory usage, I'd like to punt this one out since the subclassing process
sounds painful. As Colin mentioned, the repl factor is eventually going to be in the {{PathCacheEntry}},
so we need to store a reference to the INode's PCEs.
                
> NameNode should invoke DataNode APIs to coordinate caching
> ----------------------------------------------------------
>
>                 Key: HDFS-5053
>                 URL: https://issues.apache.org/jira/browse/HDFS-5053
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Andrew Wang
>         Attachments: hdfs-5053-1.patch, hdfs-5053-2.patch
>
>
> The NameNode should invoke the DataNode APIs to coordinate caching.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message