hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Separate data-nodes from worker-nodes
Date Fri, 14 Mar 2008 19:11:37 GMT
Andrey Pankov wrote:
> It's a little bit expensive to have big cluster running for a long 
> period, especially if you use EC2. So, as possible solution, we can 
> start additional nodes and include them into cluster before running job, 
> and then, after finishing, kill unused nodes.

As Ted has indicated, that should work.  It won't be as fast as if you 
keep the entire cluster running the whole time, but it will be much cheaper.

An alternative is to store your persistent data in S3.  Then you can 
shut down your cluster altogether when you're not computing.  Your 
startup time each day will be slower, since reading from S3 is slower 
than reading from HDFS, so this may or may not be practical for you.


View raw message