Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6A690D7A8 for ; Thu, 13 Sep 2012 18:19:15 +0000 (UTC) Received: (qmail 99263 invoked by uid 500); 13 Sep 2012 18:19:13 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 99187 invoked by uid 500); 13 Sep 2012 18:19:13 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 99179 invoked by uid 99); 13 Sep 2012 18:19:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 18:19:13 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yuxian199@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-gh0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 18:19:03 +0000 Received: by ghbz16 with SMTP id z16so927114ghb.35 for ; Thu, 13 Sep 2012 11:18:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:from:to:cc:subject:date:mime-version:content-type :x-priority:x-msmail-priority:importance:x-mailer:x-mimeole; bh=/vcpw7XS1A0Y/JiwvZOBZYw2PvJGCpm27yvZw0q86fg=; b=zkPOntoHGL9QzxgH8mK2+ISHYknXBLg0AYRMd5xWI8sx3ctcf90dKln9qyMMxnnBv1 /2J2TY9BtOzP3sCvF+whNBYs5wTNkhYV0QZB2czmOd5BvOPycq71eDuC2ZHEEZqKHrxc Br8kgRAvN3BxZuWugpQcbZoTRoNvwzMpr/7+jZfoo6VrnGdF4f8N3njybKdZPU0KcKyW RYCGYqR75HtpISmzMfItbxz4/KV93AjiMnVn90CAhUGgqRAqpu6GuPOn5y5S6rlYq+15 fqX2QCkk76PsI1NxlbVIgigiLJgqTHTpZ9HXq1z0wr5iKIocV94rNj+avodSNZTClLzJ A6Jw== Received: by 10.236.78.234 with SMTP id g70mr259356yhe.24.1347560323041; Thu, 13 Sep 2012 11:18:43 -0700 (PDT) Received: from GloryPC (protoss.csc.ncsu.edu. [152.14.93.18]) by mx.google.com with ESMTPS id y10sm42132782yhd.6.2012.09.13.11.18.40 (version=SSLv3 cipher=OTHER); Thu, 13 Sep 2012 11:18:42 -0700 (PDT) Message-ID: <86C90FA0A0B44C1BBA3CB831ED0257C1@GloryPC> From: "Xianqing Yu" To: Cc: "Peng Ning" Subject: Make Hadoop run more securely in Public Cloud environment Date: Thu, 13 Sep 2012 14:18:45 -0400 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_00BA_01CD91BA.AF428810" X-Priority: 3 X-MSMail-Priority: Normal Importance: Normal X-Mailer: Microsoft Windows Live Mail 16.4.3503.728 X-MimeOLE: Produced By Microsoft MimeOLE V16.4.3503.728 ------=_NextPart_000_00BA_01CD91BA.AF428810 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi Hadoop community, I am a Ph.D student in North Carolina State University. I am modifying = the Hadoop's code (which including most parts of Hadoop, e.g. = JobTracker, TaskTracker, NameNode, DataNode) to achieve better security. My major goal is that make Hadoop running more secure in the Cloud = environment, especially for public Cloud environment. In order to = achieve that, I redesign the currently security mechanism and achieve = following proprieties: 1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, = HDFS access control is based on user or block granularity, e.g. HDFS = Delegation Token only check if the file can be accessed by certain user = or not, Block Token only proof which block or blocks can be accessed. I = make Hadoop can do byte-granularity access control, each access party, = user or task process can only access the bytes she or he least needed. 2. I assume that in the public Cloud environment, only Namenode, = secondary Namenode, JobTracker can be trusted. A large number of = Datanode and TaskTracker may be compromised due to some of them may be = running under less secure environment. So I re-design the secure = mechanism to make the damage the hacker can do to be minimized. a. Re-design the Block Access Token to solve wildly shared-key problem = of HDFS. In original Block Access Token design, all HDFS (Namenode and = Datanode) share one master key to generate Block Access Token, if one = DataNode is compromised by hacker, the hacker can get the key and = generate any Block Access Token he or she want. b. Re-design the HDFS Delegation Token to do fine-grain access control = for TaskTracker and Map-Reduce Task process on HDFS.=20 In the Hadoop 0.20.204, all TaskTrackers can use their kerberos = credentials to access any files for MapReduce on HDFS. So they have the = same privilege as JobTracker to do read or write tokens, copy job file, = etc.. However, if one of them is compromised, every critical thing in = MapReduce directory (job file, Delegation Token) is exposed to attacker. = I solve the problem by making JobTracker to decide which TaskTracker can = access which file in MapReduce Directory on HDFS. For Task process, once it get HDFS Delegation Token, it can access = everything belong to this job or user on HDFS. By my design, it can only = access the bytes it needed from HDFS. There are some other improvement in the security, such as TaskTracker = can not know some information like blockID from the Block Token (because = it is encrypted by my way), and HDFS can set up secure channel to send = data as a option. By those features, Hadoop can run much securely under uncertain = environment such as Public Cloud. I already start to test my prototype. = I want to know that whether community is interesting about my work? Is = that a value work to contribute to production Hadoop? I created JIRA for the discussion. = https://issues.apache.org/jira/browse/HADOOP-8803#comment-13455025=20 Thanks, Xianqing=20 ------=_NextPart_000_00BA_01CD91BA.AF428810--