hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Running scale tests
Date Wed, 31 Aug 2011 15:31:25 GMT

That really depends on how accurate you want your tests to be vs. how long you want to spend
running them.  Often automated OS benchmarks will reinstall the OS before running tests to
try to ensure it is in a known state.  File system bench marks usually will unmount the file
system and then remount it to ensure that caches are empty.  If you want to be very accurate
then reformat HDFS and reconfigure everything. However even in ideal situations it is difficult
to get consistent performance numbers out of a multinode cluster.  I would suggest you bring
up the cluster with the maximum number of nodes you want to test with, then shut down some
of the data nodes and task trackers on the machines you don't want (Just like if the box died).
 Then wait for the data on them to finish being replicated.  It should be fairly close to
what you would expect .

You can also do it programmatically.  You can black list machines using the admin interface
on the task tracker and name node, but I have not done it before.

--Bobby Evans

On 8/29/11 12:07 AM, "Jeremy Villalobos" <jeremyvillalobos@gmail.com> wrote:


The following questions are from an system administrator point of view.

How do I run scale tests using different numbers of nodes ?  Do I have to shutdown and restart
hadoop to do this ?
What about dfs, do I have to reformat when changing the number of nodes down ?

Is there a "machines file" as done in MPI where I can specify just the number of nodes to
be used for a test ?


View raw message