hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej Trebacz <maciej.treb...@gmail.com>
Subject Using Hadoop in non-typical large scale user-driven environment
Date Wed, 02 Dec 2009 21:50:24 GMT
First of all, I'd like to say hi to all people on the list.

I ran across Hadoop and Cloudera projects recently, and I was
immediately intrigued with it, because I'm in the middle of writing a
project that will use large scale distributed computing for a degree
in my school. It seems like a perfect tool for me to use, but I have
some questions to get sure this is the right tool for my needs.

Project I'm making assumes that there is one master node which is
distributing data and there are several (in theory, hundreds,
thousands or more) slave nodes. To this point, this is exactly what
Hadoop is for. But now is the tricky part. I want the slaves to be
computers that are used by people everyday. Think SETI@Home. So user
installs Hadoop client and ideally - forgets about it, and his
computer helps to do the computations. Also, user will not want to
spend much of his hard drive for the computation data.

The problem with this model, as far as I understand, is that users
will often shut down their computers (for whatever reason), once a day
or even more. Will that be a big problem for Hadoop server to handle?
I mean, I am afraid that most of processing power and bandwidth will
be used for controlling the traffic in the network and it will not be

I will appreciate any opinion in this case.

Best regards,
Maciej "mav" Trębacz from Poland.

View raw message