hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6634) inotify in HDFS
Date Mon, 04 Aug 2014 18:34:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085051#comment-14085051

Andrew Wang commented on HDFS-6634:

Hey James, thanks for posting this. I took a quick look, have some surface level comments:

- Need license headers on new files
- Class javadoc for new classes would be nice
- InterfaceAudience and InterfaceStability annotations on new classes
- Could we pack each SubEvent as a nested class in Event? Then we can javadoc just Event.
Could also squeeze EventList in there if you're feeling it.
- Javadoc has things like @link for referring to other classes and methods, @return and @param
too. Might clarify things, especially for user-facing classes
- It'd be good to have an EventsList PB definition so it matches up with the Java classes.

- Need a public class where we can get the new EventStream. My advice is putting it in DistributedFileSystem
and then exposing to users in HdfsAdmin.
- Slapping a big fat javadoc on the new user method in HdfsAdmin would be good, since it'll
need to explain its intended purpose and how to properly use the API.
- If you included an "event code" or "event type" enum for each event, you could switch over
that instead of instaceof in PBHelper. It'd be kind of like how FSEditLogOpCodes work.

- I think it'd be better if we had an API that did a get-with-timeout like {{Future#get}},
seems like what an app would want
- I also don't follow why we need resync. If we get an IOException in next(), what can the
app do besides resync and retry?
- At a high-level, it's be good to think how people will be using this API and dealing with
failures. This is a good thing to experiment with in the test cases, forcing certain error
conditions and seeing what the user code has to do to handle them.

- We very likely need to do getEditsFromTxid with the FSNamesystem read lock held. However,
we also don't want to do any kind of RPC with the FSN lock held.
- That while loop condition looks hairy, could you split it up and add comments?
- What if a client is way behind when this is called? It looks like it ends up getting all
the edits logs, which could be GBs. This API should be batched to only send some configurable
# of events.

- Can we do a switch/case on the op's opcode? It's an enum. Faster/better than doing instanceof.

> inotify in HDFS
> ---------------
>                 Key: HDFS-6634
>                 URL: https://issues.apache.org/jira/browse/HDFS-6634
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs-client, namenode, qjm
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6634.2.patch, HDFS-6634.patch, inotify-design.2.pdf, inotify-design.pdf,
inotify-intro.2.pdf, inotify-intro.pdf
> Design a mechanism for applications like search engines to access the HDFS edit stream.

This message was sent by Atlassian JIRA

View raw message