hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bart Vandewoestyne <Bart.Vandewoest...@telenet.be>
Subject TestDFSIO and hadoop config options
Date Tue, 07 Oct 2014 07:27:14 GMT
Hello list,

I would like to experiment with TestDFSIO and run some benchmarks under 
different configuration settings.  One of the things I would like to 
experiment with is to see for example how the block replication factor 
(dfs.replication) has an influence on the TestDFSIO results.

I'm using the following version of Hadoop and CDH:

bart@sandy-quad-1:~$ hadoop version
Hadoop 2.3.0-cdh5.1.2
Subversion git://github.sf.cloudera.com/CDH/cdh.git -r 
8e266e052e423af592871e2dfe09d54c03f6a0e8
Compiled by jenkins on 2014-08-26T01:36Z
Compiled with protoc 2.5.0
 From source with checksum ec11b8ec19ca2bf3e7cb1bbe4ee182
This command was run using 
/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop/hadoop-common-2.3.0-cdh5.1.2.jar

My main problem is how I can easily change the replication factor for 
each run of TestDFSIO.  I see two options:

1) Change the dfs.replication configuration value in my Cloudera 
Manager, restart my cluster, and re-run TestDFSIO.

2) Somehow pass the different dfs.replication option to the command line 
of TestDFSIO.  On 
http://grokbase.com/t/cloudera/cdh-user/131zfsvves/testdfsio-slow-with-replication-1 
I see that people run the TestDFSIO benchmark with the '-D 
dfs.replication=1' option.  This is probably the better way to go?

Method 1 seems cumbersome, and it looks like method 2 does not give any 
errors on my cluster, but how can I check if TestDFSIO was indeed run 
with the replication factor I specified with the -D option?

Kind regards,
Bart

Mime
View raw message