incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Don Bosco Durai <bdu...@hortonworks.com>
Subject Re: [PROPOSAL] Apache Argus Proposal
Date Thu, 17 Jul 2014 06:34:08 GMT
> How do you define the 'Hadoop complex eco-system'? If that definition
Agreed, complex is a relative term. I used the term complex, because now more than 20 products
use Hadoop and list is growing. There are 10 products listed on http://hadoop.apache.org/.
Then there are others projects like Accumulo, Impala, Storm, Kafka, Falcon, Pig, Flume, Sqoop,
Oozie, etc. which uses HDFS or support/enable other products within Hadoop ecosystem. If we
dig deeper, each component might have multiple processes (Name Node, Data Node, Job Tracker,
Storm Nimbus Server, HBase Master Servers, HBase Regions Servers, HA, etc). With YARN, now
user can run their applications in the cluster, which is a great feature, but it is very scary
from security point of view, because now users can write their custom application and run
it within a secure data center.

I don’t feel one technology or one company or one small group or one approach can solve
this problem. This has to be addressed by the community working together. This would also
require a lot of support from each dependent projects and lot of co-ordination. And there
would be multiple security solutions available for the end users to pick from.

> includes projects such as HBase, we have significant security controls, so
The mature projects have started beefing up their security features. In recent releases, HBase
added cell based access control and encryption, HDFS added advanced ACLs and now working on
file level encryptions, Hive added ATZ-NG, no encryption yet. The newer ones like Solr, Storm,
Falcon have very basic security control. On the good news side, most components have started
supporting Kerberos and SSL. But encryption at rest is still a challenge. In most cases it
is all or none, except probably HBase and Accumulo. Access control and auditing is also not
that mature among the newer projects. The goal is here is not to reinvent or impose on each
project, but to reuse the existing security technologies consistently across projects and
at the same extend it where applicable.

> or the combination of Hive+Sentry would agree with that statement either.
Personally, Hive is my ideal role model for all hadoop projects to follow. Out of the box,
it has inbuilt access control, but also provides APIs to plug your authorization model. Now
security projects like Argus can extend it to support attribute based access control, cell
based access control, tagging, multi-tenancy, auditing, etc. Users based on their security
requirement or appetite might decide to go with the default or choose one of the other security
providers. Similar requirements might be there for HBase, but expecting all Hadoop components
to keep up with each other is counter productive, while a dedicated security provider (project)
might do more extensive and uniform job. Users might also pick multiple security providers
within their cluster to address specific security concerns.

Since we are on the topic of complexity, one of the reason Hadoop is popular is because of
its openness. Hive might be on top of anything, e.g. on HDFS,  HBase+HDFS, flat file, etc.
While you can access SQL queries via Hive, you can also write Pig or MR job to access the
underlying HDFS file directly. This is a powerful feature, which now gives them ability to
run sophisticated analytical jobs or use enterprise grade BI tool. But this also allows users
to circumvent Hive’s native security. For Hive or any native component, cross component
security is out of scope (and should be). This problem can be solved by security providers
like Argus, who can enforce adequate security consistently across components or project boundaries.


Happy to discuss more on this topic.

Thanks

Bosco


On Jul 16, 2014, at 7:38 PM, Andrew Purtell <apurtell@apache.org> wrote:

> This statement might not be quite right:
> 
>> Even within Hadoop complex eco-system, each components have limited or no
> security controls.
> 
> How do you define the 'Hadoop complex eco-system'? If that definition
> includes projects such as HBase, we have significant security controls, so
> that wouldn't be a correct statement. Not sure those working on Accumulo,
> or the combination of Hive+Sentry would agree with that statement either.
> 
> It's not necessary to survey the Hadoop ecosystem before incubating of
> course, or even after, but it sounds like that might be a good idea.
> 
> 
> 
> On Wed, Jul 16, 2014 at 5:06 PM, Don Bosco Durai <bdurai@hortonworks.com>
> wrote:
> 
>> Hi JB
>> 
>> We will be centralizing the administration and auditing for Knox. And we
>> will be also standardizing the authentication for web applications for all
>> components within Hadoop ecosystem, for which we might consider Shiro. I
>> would like to understand more about Syncope and see how production ready it
>> is...
>> 
>> The principle is to leverage existing security solutions where applicable.
>> Even within Hadoop complex eco-system, each components have limited or no
>> security controls. Instead of re-inventing everything, we will extend the
>> core component security capabilities and add where needed. So the security
>> is uniform, plug able and scalable.
>> 
>> Providing a layered security along with central administration and
>> auditing capabilities will enhance the security, usability, enterprise
>> integration, compliance, etc. which will lead to more adoption of Apache
>> Hadoop and projects working within its eco system.
>> 
>> Regards
>> 
>> Bosco
>> 
>> `
>> On Jul 16, 2014, at 12:12 AM, Jean-Baptiste Onofré <jb@nanthrax.net>
>> wrote:
>> 
>>> Hi,
>>> 
>>> it looks interesting.
>>> 
>>> Do you have an idea about the interactions with other projects (Knox,
>> Shiro, Syncope, whatever) ?
>>> 
>>> Regards
>>> JB
>>> 
>>> --
>>> Jean-Baptiste Onofré
>>> jbonofre@apache.org
>>> http://blog.nanthrax.net
>>> Talend - http://www.talend.com
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>> For additional commands, e-mail: general-help@incubator.apache.org
>>> 
>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>> 
>> 
> 
> 
> -- 
> Best regards,
> 
>   - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message