hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen" <jason.rutherg...@gmail.com>
Subject Re: Question about Hadoop 's Feature(s)
Date Tue, 30 Sep 2008 00:32:32 GMT
I implemented an RMI protocol using Hadoop IPC and implemented basic
HMAC signing.  It is I believe faster than public key private key
because it uses a secret key and does not require public key
provisioning like PKI would.  Perhaps it would be a baseline way to
sign the data.

On Thu, Sep 25, 2008 at 7:47 AM, Steve Loughran <stevel@apache.org> wrote:
> Owen O'Malley wrote:
>> On Sep 24, 2008, at 1:50 AM, Trinh Tuan Cuong wrote:
>>> We are developing a project and we are intend to use Hadoop to handle the
>>> processing vast amount of data. But to convince our customers about the
>>> using of Hadoop in our project, we must show them the advantages ( and maybe
>>> ? the disadvantage ) when deploy the project with Hadoop compare to Oracle
>>> Database Platform.
>> The primary advantage of Hadoop is scalability. On an equivalent hardware
>> budget, Hadoop can handle much much larger databases. We had a process that
>> was run once a week on Oracle that is now run once an hour on Hadoop.
>> Additionally, Hadoop scales out much much farther. We can store petabytes of
>> data in a single Hadoop cluster and have jobs that read and generate 100's
>> of terabytes.
> That said, what a database gives you -on the right hardware- is very fast
> responses, especially if the indices are set up right and the data
> denormalised when appropriate. There is also really good integration with
> tools and application servers, with things like Java EE designed to make
> running code against a database easy.
> Not using Oracle means you don't have to work with an Oracle DBA, which, in
> my experience, can only be a good thing. DBAs and developers never seem to
> see eye-to-eye.
>>  Hadoop only has very primitive security at the moment, although I expect
>> that to change in the next 6 months.
> Right now you need to trust everyone else on the network where you run
> hadoop to not be malicious; the filesystem and job tracker interfaces are
> insecure. The forthcoming 0.19 release will ask who you are, but the far end
> trusts you to be who you say you are. In that respect, it's as secure as NFS
> over UDP.
> To secure Hadoop you'd probably need to
>  -sign every IPC request, with a CPU time cost at both ends.
>  -require some form of authentication for the HTTP exported parts of the
> system, such as digest authentication, or issue lots of HTTPS private keys
> and use that instead. Giving everyone a key management problem as well as
> extra communications overhead.
> What is easier would be to lock down remote access to the filesystem/job
> submission so that only authenticated users would be able to upload jobs and
> data. The cluster would continue to trust everything else on its network,
> but the system doesn't trust people to submit work unless they could prove
> who they were.

View raw message