hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-9184) Logging HDFS operation's caller context into audit logs
Date Thu, 08 Oct 2015 19:10:28 GMT

     [ https://issues.apache.org/jira/browse/HDFS-9184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mingliang Liu updated HDFS-9184:
    Attachment: HDFS-9184.001.patch

Thanks all for the input.

Before we have a perfect solution, we consider this approach a feasible option for the heavily
needed goal. In terms of security, it seems flawed. There is a signature field when building
the caller context which may be useful for the offline analysis and validation.

The v1 patch aims to address the incompatible concern. We don't think there is "significant
compatibility" issue here. Specially,
* We won't record the caller context unless its config key is explicitly turned on by users
* NO existing API is changed to implement this feature
* The current layout of the audit log is not changed as there will be an *optional* kvp in
the end of the line.
Just for the record: it's good to make audit log itself have well-defined structure and format
in the future. 

As using {{htrace}}, which depends on 100% sampling across many spans, is totally different
from this approach, this patch does not adopt it. If performance problem is really a concern,
I don't expect {{htrace}} can do better.

> Logging HDFS operation's caller context into audit logs
> -------------------------------------------------------
>                 Key: HDFS-9184
>                 URL: https://issues.apache.org/jira/browse/HDFS-9184
>             Project: Hadoop HDFS
>          Issue Type: Task
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>         Attachments: HDFS-9184.000.patch, HDFS-9184.001.patch
> For a given HDFS operation (e.g. delete file), it's very helpful to track which upper
level job issues it. The upper level callers may be specific Oozie tasks, MR jobs, and hive
queries. One scenario is that the namenode (NN) is abused/spammed, the operator may want to
know immediately which MR job should be blamed so that she can kill it. To this end, the caller
context contains at least the application-dependent "tracking id".
> There are several existing techniques that may be related to this problem.
> 1. Currently the HDFS audit log tracks the users of the the operation which is obviously
not enough. It's common that the same user issues multiple jobs at the same time. Even for
a single top level task, tracking back to a specific caller in a chain of operations of the
whole workflow (e.g.Oozie -> Hive -> Yarn) is hard, if not impossible.
> 2. HDFS integrated {{htrace}} support for providing tracing information across multiple
layers. The span is created in many places interconnected like a tree structure which relies
on offline analysis across RPC boundary. For this use case, {{htrace}} has to be enabled at
100% sampling rate which introduces significant overhead. Moreover, passing additional information
(via annotations) other than span id from root of the tree to leaf is a significant additional
> 3. In [HDFS-4680 | https://issues.apache.org/jira/browse/HDFS-4680], there are some related
discussion on this topic. The final patch implemented the tracking id as a part of delegation
token. This protects the tracking information from being changed or impersonated. However,
kerberos authenticated connections or insecure connections don't have tokens. [HADOOP-8779]
proposes to use tokens in all the scenarios, but that might mean changes to several upstream
projects and is a major change in their security implementation.
> We propose another approach to address this problem. We also treat HDFS audit log as
a good place for after-the-fact root cause analysis. We propose to put the caller id (e.g.
Hive query id) in threadlocals. Specially, on client side the threadlocal object is passed
to NN as a part of RPC header (optional), while on sever side NN retrieves it from header
and put it to {{Handler}}'s threadlocals. Finally in {{FSNamesystem}}, HDFS audit logger will
record the caller context for each operation. In this way, the existing code is not affected.
> It is still challenging to keep "lying" client from abusing the caller context. Our proposal
is to add a {{signature}} field to the caller context. The client choose to provide its signature
along with the caller id. The operator may need to validate the signature at the time of offline
analysis. The NN is not responsible for validating the signature online.

This message was sent by Atlassian JIRA

View raw message