hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5052) Add cacheRequest/uncacheRequest support to DFSAdmin and NameNode
Date Wed, 21 Aug 2013 06:13:52 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13745814#comment-13745814
] 

Andrew Wang commented on HDFS-5052:
-----------------------------------

Looks substantially different (and slimmer), thanks for putting in the legwork.

Easy comments first:

Whitespace:
{code}
  // lines are more than 80 chars
  public List<Fallible<PathCacheEntry>> addPathCacheDirectives(List<PathCacheDirective>
paths)
    ...
  public RemoteIterator<PathCacheEntry> listPathCacheEntries(long startId, String pool,
...
  // this fits on one line
  public synchronized List<Fallible<Long>>
        removeEntries(List<Long> entryIds) {
...
  // extra space
        return replies ;
{code}

Javadoc:
- Javadoc in {{ClientProtocol}} looks like it needs updating, since it references {{elementsList}}
and IDs, which seem to no longer be used.
- Nifty idea with the {{Fallible}}, it's kind of like a {{Future}}! I'd just add a note to
the class Javadoc explaining that it's useful as a return value for batch APIs for granular
error reporting.

General:
- Rather than rolling your own {{hashCode}} and {{equals}} functions, you could use {{HashCodeBuilder}}
and {{EqualsBuilder}} from Apache Commons.
- It looks like you want to defer adding this to an actual client API (e.g. DistributedFileSystem
or DFSShell). I guess this is okay, but we'll need to address end-to-end test coverage for
{{CacheManager}}.

CacheManager:
- {{entriesByDirective}} doesn't use any of the {{TreeMap}} specialness, consider using a
{{HashMap}}.
- In {{addDirective}}, do we need to do {{entriesById.ceilingEntry}} over just a {{get}}?
Could save the check that the key matches right afterwards.
- In {{listPathCacheEntries}}, it seems a bit wasteful to allocate an ArrayList with capacity
{{maxReplies}} for the {{cur == null}} case.
- In {{listPathCacheEntries}}, could we use {{TreeMap#tailMap}} instead of reaching around
with {{ceilingEntry}} and {{higherEntry}}? Also, it might be more efficient to use an iterator
rather than repeatedly calling {{higherEntry}}.
- In {{listPathCacheEntries}}, it looks up {{ceilingEntry(startId)}}, which returns the next
element greater or equal in the TreeMap. I assume that this is the last id the client has
read, in which case we'll return the {{startId}} entry again (duplicate listing).

FSNamesystem
- These methods are going to need permission checks. This can be a follow-on, but at the least,
it means we need to store the requesting user in the {{PathCacheDirective}}. I'm not entirely
sure how this will work though, since I don't think Hive does user impersonation in the HiveServer2/Sentry
world.
- We probably also want a method for listing {{PathCacheEntries}} by user, not just pool.
Let's punt this out too though.

PathCacheDirective
- Regarding {{cloneNormalize}}, we need to qualify the paths on the client side anyway to
pick up the working directory, so we could just require that DFSClient always give us an absolute
path. This is pretty normal.
- It feels weird that the checks are split across {{CacheManager#addDirective}} and {{cloneNormalize}}.
It looks like {{CacheManager}} is the only consumer of {{cloneNormalize}}, so can we just
do all the checks in {{CacheManager}}?

To recap, potential follow-on JIRAs to file:
- User APIs with end-to-end tests
- Persist CacheManager state in the edit log
- Permissions on caching directives and cache pools
- Listing of PathCacheEntries by user, or pool and user
                
> Add cacheRequest/uncacheRequest support to DFSAdmin and NameNode
> ----------------------------------------------------------------
>
>                 Key: HDFS-5052
>                 URL: https://issues.apache.org/jira/browse/HDFS-5052
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5052.003.patch, HDFS-5052.004.patch, HDFS-5052-caching.002.patch
>
>
> Add cacheRequest/uncacheRequest/listCacheRequest support to DFSAdmin and NameNode.  Maintain
a list of active CachingRequests on the NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message