Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 40DA410117 for ; Tue, 3 Sep 2013 01:56:52 +0000 (UTC) Received: (qmail 65280 invoked by uid 500); 3 Sep 2013 01:56:52 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 65227 invoked by uid 500); 3 Sep 2013 01:56:52 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 65121 invoked by uid 99); 3 Sep 2013 01:56:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Sep 2013 01:56:51 +0000 Date: Tue, 3 Sep 2013 01:56:51 +0000 (UTC) From: "Andrew Wang (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-4680) Audit logging of delegation tokens for MR tracing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-4680: ------------------------------ Attachment: hdfs-4680-4.patch Hey folks, here's a revised patch which avoids MD5 overhead when the conf option is disabled. Pushing down the conf option was somewhat ugly; ideally this change would only touch HDFS' DTSM, but some of the {{DelegationTokenInformation}} add hooks are in ADTSM. bq. The instanceof for the default audit logger seems like it can/should be avoided... I wanted to avoid modifying AuditLogger, since it's a public interface and there can be external implementations. I think this means compatibility issues, but suggestions welcome. bq. it would be ideal if the connection knew the trackingId... I'm not sure the best way of doing this, but again, suggestions welcome. > Audit logging of delegation tokens for MR tracing > ------------------------------------------------- > > Key: HDFS-4680 > URL: https://issues.apache.org/jira/browse/HDFS-4680 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, security > Affects Versions: 2.0.3-alpha > Reporter: Andrew Wang > Assignee: Andrew Wang > Attachments: hdfs-4680-1.patch, hdfs-4680-2.patch, hdfs-4680-3.patch, hdfs-4680-4.patch > > > HDFS audit logging tracks HDFS operations made by different users, e.g. creation and deletion of files. This is useful for after-the-fact root cause analysis and security. However, logging merely the username is insufficient for many usecases. For instance, it is common for a single user to run multiple MapReduce jobs (I believe this is the case with Hive). In this scenario, given a particular audit log entry, it is difficult to trace it back to the MR job or task that generated that entry. > I see a number of potential options for implementing this. > 1. Make an optional "client name" field part of the NN RPC format. We already pass a {{clientName}} as a parameter in many RPC calls, so this would essentially make it standardized. MR tasks could then set this field to the job and task ID. > 2. This could be generalized to a set of optional key-value *tags* in the NN RPC format, which would then be audit logged. This has standalone benefits outside of just verifying MR task ids. > 3. Neither of the above two options actually securely verify that MR clients are who they claim they are. Doing this securely requires the JobTracker to sign MR task attempts, and then having the NN verify this signature. However, this is substantially more work, and could be built on after idea #2. > Thoughts welcomed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira