hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kingshuk Chatterjee" <mindstatic1...@gmail.com>
Subject RE: Make Hadoop run more securely in Public Cloud environment
Date Thu, 13 Sep 2012 19:16:42 GMT
Absolutely. And if such byte level security is inbuilt into the product, and data access is
isolated, which also means hacks can be isolated too, then it becomes easier for us to sell
the idea to the Hospitals CIOs and CSOs. Let’s hear what our folks here have to say about
it too. 


-----Original Message-----
From: Xianqing Yu [mailto:yuxian199@gmail.com] 
Sent: Thursday, September 13, 2012 12:04 PM
To: Kingshuk Chatterjee; common-dev@hadoop.apache.org
Cc: Peng Ning
Subject: Re: Make Hadoop run more securely in Public Cloud environment

Hi Kingshuk,

Thank you for your interesting.

I think you make a very nice example. If Healthcare company push their data to public cloud,
the byte-level access control can minimize the data every party can get (e.g. task process).
So even one task process or TaskTracker is hacked, the information loss can be minimized.

Another feature is also very help to this scenario. Currently all NameNode and DataNodes are
sharing the same key to generate Block Access Token. If the hacker get the key by hacking
any one of HDFS machine, she or he potentially can read everything in the HDFS and impact
is huge. So I re-design that to make sure that, if hacker success to attack one machine, he
or she can only get what is on this machine, not others in the cluster.

And also secure channel (encrypted channel) to transfer data can be another security bonus.



-----Original Message-----
From: Kingshuk Chatterjee
Sent: Thursday, September 13, 2012 2:23 PM
To: 'Peng Ning'
Cc: yuxian199@gmail.com
Subject: RE: Make Hadoop run more securely in Public Cloud environment

Hi Xianqing -

I am a systems architect and a consultant for Healthcare industry, and the first impression
I get from your email below is that the byte level security can be a very helpful feature
in securing patient's health information (PHI), and assuring the healthcare service providers
to take steps to push their data to public cloud.

I will be happy to contribute in anyway, let me know.


Kingshuk Chatterjee
Director, Technology Consulting
5155 Rosecrans Ave, Suite 250               http://www.calance.com
Hawthorne, CA 90250                              '  +1-(412 606 8582)

-----Original Message-----
From: Xianqing Yu [mailto:yuxian199@gmail.com]
Sent: Thursday, September 13, 2012 11:19 AM
To: common-dev@hadoop.apache.org
Cc: Peng Ning
Subject: Make Hadoop run more securely in Public Cloud environment

Hi Hadoop community,

I am a Ph.D student in North Carolina State University. I am modifying the Hadoop's code (which
including most parts of Hadoop, e.g. JobTracker, TaskTracker, NameNode, DataNode) to achieve
better security.

My major goal is that make Hadoop running more secure in the Cloud environment, especially
for public Cloud environment. In order to achieve that, I redesign the currently security
mechanism and achieve following

1. Bring byte-level access control to Hadoop HDFS. Based on 0.20.204, HDFS access control
is based on user or block granularity, e.g. HDFS Delegation Token only check if the file can
be accessed by certain user or not, Block Token only proof which block or blocks can be accessed.
I make Hadoop can do byte-granularity access control, each access party, user or task process
can only access the bytes she or he least needed.

2. I assume that in the public Cloud environment, only Namenode, secondary Namenode, JobTracker
can be trusted. A large number of Datanode and TaskTracker may be compromised due to some
of them may be running under less secure environment. So I re-design the secure mechanism
to make the damage the hacker can do to be minimized.

a. Re-design the Block Access Token to solve wildly shared-key problem of HDFS. In original
Block Access Token design, all HDFS (Namenode and
Datanode) share one master key to generate Block Access Token, if one DataNode is compromised
by hacker, the hacker can get the key and generate any  Block Access Token he or she want.

b. Re-design the HDFS Delegation Token to do fine-grain access control for TaskTracker and
Map-Reduce Task process on HDFS.

In the Hadoop 0.20.204, all TaskTrackers can use their kerberos credentials to access any
files for MapReduce on HDFS. So they have the same privilege as JobTracker to do read or write
tokens, copy job file, etc.. However, if one of them is compromised, every critical thing
in MapReduce directory (job file, Delegation Token) is exposed to attacker. I solve the problem
by making JobTracker to decide which TaskTracker can access which file in MapReduce Directory
on HDFS.

For Task process, once it get HDFS Delegation Token, it can access everything belong to this
job or user on HDFS. By my design, it can only access the bytes it needed from HDFS.

There are some other improvement in the security, such as TaskTracker can not know some information
like blockID from the Block Token (because it is encrypted by my way), and HDFS can set up
secure channel to send data as a option.

By those features, Hadoop can run much securely under uncertain environment such as Public
Cloud. I already start to test my prototype. I want to know that whether community is interesting
about my work? Is that a value work to contribute to production Hadoop?

I created JIRA for the discussion. 



View raw message