hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
Date Thu, 02 Aug 2012 22:49:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427692#comment-13427692

Aaron T. Myers commented on HDFS-3680:

bq. Yes, a faulty customer logger will affect the functionality of the NN. The question is
what is level of risk to the NN, and is it acceptable? 

I'll reiterate my previous point: this is not arbitrary users writing and installing custom
logger implementations into the NN. This will likely be a handful of people writing these,
and operators will have to consciously install them into the NN. The people who are involved
in writing a custom logger should be aware of the inherent risks of doing so and should write
defensive code. This is not the place in the Hadoop code where we should be holding the hand
of the users.

I think this point is further supported by your observation that people seem to agree that
shutting down the NN is the right answer if an event fails to get logged by a custom logger.
If that's what we intend to have happen in the event of custom audit log failure, then the
worst case scenario of a custom audit logger seg faulting or calling System.exit really isn't
that bad. The NN already has to handle an ungraceful shutdown and maintain data integrity,
so the marginal increase of danger should be low.

bq. If it's in a daemon, the NN will notice the RPC server crashed and may initiate a clean

This is one potential implementation of a custom audit logger that would potentially reduce
the risk, but we should not force this design on the writers of custom loggers.

bq. I'd assume it [latency of RPC call] can't be that bad since most of hadoop uses it.

I agree with Marcelo that the latency of doing an RPC per audit log is likely unacceptably
high. Just because the latency of the Hadoop Server/Client implementations is acceptable for
FS/MR job operations doesn't mean it's sufficient for audit logging. My guess would be that
it's not acceptable for anything but the smallest use cases. At the very least, it seems to
make the performance near acceptable we couldn't actually do an RPC per log event, but instead
would have to buffer and group the events into fewer calls, effectively group commit on the
log events. That sort of complexity should be left up to the custom logger author, if using
a separate daemon is actually required.
> Allows customized audit logging in HDFS FSNamesystem
> ----------------------------------------------------
>                 Key: HDFS-3680
>                 URL: https://issues.apache.org/jira/browse/HDFS-3680
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 2.0.0-alpha
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>            Priority: Minor
>         Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch,
hdfs-3680-v4.patch, hdfs-3680-v5.patch
> Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit
logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's
say a database), because it would require the code to implement a log appender (and thus know
what logging system is actually being used underneath the fa├žade), and parse the textual
log message generated by FSNamesystem.
> I'm attaching a patch that introduces a cleaner interface for this use case.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message