hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: Using Hadoop in non-typical large scale user-driven environment
Date Wed, 02 Dec 2009 22:28:14 GMT

How are you planning on dealing with the data integrity/security aspects?
Running code on untrusted machines means that the data going in and out of a
node will need to be vetted.

On 12/2/09 1:50 PM, "Maciej Trebacz" <maciej.trebacz@gmail.com> wrote:

> First of all, I'd like to say hi to all people on the list.
> I ran across Hadoop and Cloudera projects recently, and I was
> immediately intrigued with it, because I'm in the middle of writing a
> project that will use large scale distributed computing for a degree
> in my school. It seems like a perfect tool for me to use, but I have
> some questions to get sure this is the right tool for my needs.
> Project I'm making assumes that there is one master node which is
> distributing data and there are several (in theory, hundreds,
> thousands or more) slave nodes. To this point, this is exactly what
> Hadoop is for. But now is the tricky part. I want the slaves to be
> computers that are used by people everyday. Think SETI@Home. So user
> installs Hadoop client and ideally - forgets about it, and his
> computer helps to do the computations. Also, user will not want to
> spend much of his hard drive for the computation data.
> The problem with this model, as far as I understand, is that users
> will often shut down their computers (for whatever reason), once a day
> or even more. Will that be a big problem for Hadoop server to handle?
> I mean, I am afraid that most of processing power and bandwidth will
> be used for controlling the traffic in the network and it will not be
> effective.
> I will appreciate any opinion in this case.

View raw message