hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Habermaas, William" <William.Haberm...@fatwire.com>
Subject RE: Using Hadoop in non-typical large scale user-driven environment
Date Wed, 02 Dec 2009 22:08:52 GMT
Hadoop isn't going to like losing its datanodes when people shutdown their computers.  
More importantly, when the datanodes are running, your users will be impacted by data replication.
Unlike Seti, Hadoop doesn't know when the user's screensaver is running so it will start doing
things when it feels like it.

Can someone else comment on whether HOD (hadoop-on-demand) would fit this scenario? 
Bill   

-----Original Message-----
From: Maciej Trebacz [mailto:maciej.trebacz@gmail.com] 
Sent: Wednesday, December 02, 2009 4:50 PM
To: common-user@hadoop.apache.org
Subject: Using Hadoop in non-typical large scale user-driven environment

First of all, I'd like to say hi to all people on the list.

I ran across Hadoop and Cloudera projects recently, and I was
immediately intrigued with it, because I'm in the middle of writing a
project that will use large scale distributed computing for a degree
in my school. It seems like a perfect tool for me to use, but I have
some questions to get sure this is the right tool for my needs.

Project I'm making assumes that there is one master node which is
distributing data and there are several (in theory, hundreds,
thousands or more) slave nodes. To this point, this is exactly what
Hadoop is for. But now is the tricky part. I want the slaves to be
computers that are used by people everyday. Think SETI@Home. So user
installs Hadoop client and ideally - forgets about it, and his
computer helps to do the computations. Also, user will not want to
spend much of his hard drive for the computation data.

The problem with this model, as far as I understand, is that users
will often shut down their computers (for whatever reason), once a day
or even more. Will that be a big problem for Hadoop server to handle?
I mean, I am afraid that most of processing power and bandwidth will
be used for controlling the traffic in the network and it will not be
effective.

I will appreciate any opinion in this case.

-- 
Best regards,
Maciej "mav" Trębacz from Poland.
Mime
View raw message