hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Ortiz <mlor...@uci.cu>
Subject Re: FW: NNbench and MRBench
Date Sun, 08 May 2011 06:14:25 GMT
El 5/8/2011 12:46 AM, stanley.shi@emc.com escribió:
> Thanks Marcos.
> This post of  Michael Noll does provide some information about how to run these benchmarks,
but there's not much information about how to evaluate the results.
> Do you know some resources about the result analysis?
>
> Thanks very much :)
>
> Regards,
> Stanley
>
> -----Original Message-----
> From: Marcos Ortiz [mailto:mlortiz@uci.cu]
> Sent: 2011年5月8日 11:09
> To: mapreduce-user@hadoop.apache.org
> Cc: Shi, Stanley
> Subject: Re: FW: NNbench and MRBench
>
> El 5/7/2011 10:33 PM, stanley.shi@emc.com escribió:
>    
>> Thanks, Marcos,
>>
>> Through these links, I still can't find anything about the NNbench and MRBench.
>>
>> -----Original Message-----
>> From: Marcos Ortiz [mailto:mlortiz@uci.cu]
>> Sent: 2011年5月8日 10:23
>> To: mapreduce-user@hadoop.apache.org
>> Cc: Shi, Stanley
>> Subject: Re: FW: NNbench and MRBench
>>
>> El 5/7/2011 8:53 PM, stanley.shi@emc.com escribió:
>>
>>      
>>> Hi guys,
>>>
>>> I have a cluster of 16 machines running Hadoop. Now I want to do some benchmark
on this cluster with the "nnbench" and "mrbench".
>>> I'm new to the hadoop thing and have no one to refer to. I don't know what the
supposed result should I have?
>>> Now for mrbench, I have an average time of 22sec for a one map job. Is this too
bad? What the supposed results might be?
>>>
>>> For nnbench, what's the supposed results? Below is my result.
>>> ================
>>>                               Date&    time: 2011-05-05 20:40:25,459
>>>
>>>                            Test Operation: rename
>>>                                Start time: 2011-05-05 20:40:03,820
>>>                               Maps to run: 1
>>>                            Reduces to run: 1
>>>                        Block Size (bytes): 1
>>>                            Bytes to write: 0
>>>                        Bytes per checksum: 1
>>>                           Number of files: 10000
>>>                        Replication factor: 1
>>>                Successful file operations: 10000
>>>
>>>            # maps that missed the barrier: 0
>>>                              # exceptions: 0
>>>
>>>                               TPS: Rename: 1763
>>>                Avg Exec time (ms): Rename: 0.5672
>>>                      Avg Lat (ms): Rename: 0.4844
>>> null
>>>
>>>                     RAW DATA: AL Total #1: 4844
>>>                     RAW DATA: AL Total #2: 0
>>>                  RAW DATA: TPS Total (ms): 5672
>>>           RAW DATA: Longest Map Time (ms): 5672.0
>>>                       RAW DATA: Late maps: 0
>>>                 RAW DATA: # of exceptions: 0
>>> =============================
>>> One more question, when I set maps number to bigger, I get all zeros results:
>>> =============================
>>> Test Operation: create_write
>>>                                Start time: 2011-05-03 23:22:39,239
>>>                               Maps to run: 160
>>>                            Reduces to run: 160
>>>                        Block Size (bytes): 1
>>>                            Bytes to write: 0
>>>                        Bytes per checksum: 1
>>>                           Number of files: 1
>>>                        Replication factor: 1
>>>                Successful file operations: 0
>>>
>>>            # maps that missed the barrier: 0
>>>                              # exceptions: 0
>>>
>>>                   TPS: Create/Write/Close: 0
>>> Avg exec time (ms): Create/Write/Close: 0.0
>>>                Avg Lat (ms): Create/Write: NaN
>>>                       Avg Lat (ms): Close: NaN
>>>
>>>                     RAW DATA: AL Total #1: 0
>>>                     RAW DATA: AL Total #2: 0
>>>                  RAW DATA: TPS Total (ms): 0
>>>           RAW DATA: Longest Map Time (ms): 0.0
>>>                       RAW DATA: Late maps: 0
>>>                 RAW DATA: # of exceptions: 0
>>> =====================
>>>
>>> Can anyone point me to some documents?
>>> I really appreciate your help :)
>>>
>>> Thanks,
>>> stanley
>>>
>>>
>>>        
>> You can use these resources:
>> http://www.michael-noll.com/blog/2011/04/09/benchmarking-and-stress-testing-an-hadoop-cluster-with-terasort-testdfsio-nnbench-mrbench/
>> http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/
>> http://wiki.apache.org/hadoop/HardwareBenchmarks
>> http://www.quora.com/Apache-Hadoop/Are-there-any-good-Hadoop-benchmark-problems
>>
>> Regards
>>
>>
>>      
> Well, on the Micheal Noll's post says this:
>
> NameNode benchmark (nnbench)
> =======================
> NNBench (see src/test/org/apache/hadoop/hdfs/NNBench.java) is useful for
> load testing the NameNode hardware and configuration. It generates a lot
> of HDFS-related requests with normally very small "payloads" for the
> sole purpose of putting a high HDFS management stress on the NameNode.
> The benchmark can simulate requests for creating, reading, renaming and
> deleting files on HDFS.
>
> I like to run this test simultaneously from several machines -- e.g.
> from a set of DataNode boxes -- in order to hit the NameNode from
> multiple locations at the same time.
>
> The syntax of NNBench is as follows:
>
> NameNode Benchmark 0.4
> Usage: nnbench<options>
> Options:
>           -operation<Available operations are create_write open_read
> rename delete. This option is mandatory>
>            * NOTE: The open_read, rename and delete operations assume
> that the files they operate on, are already available. The create_write
> operation must be run before running the other operations.
>           -maps<number of maps. default is 1. This is not mandatory>
>           -reduces<number of reduces. default is 1. This is not mandatory>
>           -startTime<time to start, given in seconds from the epoch.
> Make sure this is far enough into the future, so all maps (operations)
> will start at the same time>. default is launch time + 2 mins. This is
> not mandatory
>           -blockSize<Block size in bytes. default is 1. This is not
> mandatory>
>           -bytesToWrite<Bytes to write. default is 0. This is not mandatory>
>           -bytesPerChecksum<Bytes per checksum for the files. default is
> 1. This is not mandatory>
>           -numberOfFiles<number of files to create. default is 1. This
> is not mandatory>
>           -replicationFactorPerFile<Replication factor for the files.
> default is 1. This is not mandatory>
>           -baseDir<base DFS path. default is /becnhmarks/NNBench. This
> is not mandatory>
>           -readFileAfterOpen<true or false. if true, it reads the file
> and reports the average time to read. This is valid with the open_read
> operation. default is false. This is not mandatory>
>           -help: Display the help statement
>
> The following command will run a NameNode benchmark that creates 1000
> files using 12 maps and 6 reducers. It uses a custom output directory
> based on the machine's short hostname. This is a simple trick to ensure
> that one box does not accidentally write into the same output directory
> of another box running NNBench at the same time.
>
> $ hadoop jar hadoop-*-test.jar nnbench -operation create_write \
>       -maps 12 -reduces 6 -blockSize 1 -bytesToWrite 0 -numberOfFiles 1000 \
>       -replicationFactorPerFile 3 -readFileAfterOpen true \
>       -baseDir /benchmarks/NNBench-`hostname -s`
>
> Note that by default the benchmark waits 2 minutes before it actually
> starts!
>
> MapReduce benchmark (mrbench)
> =======================
>
> MRBench (see src/test/org/apache/hadoop/mapred/MRBench.java) loops a
> small job a number of times. As such it is a very complimentary
> benchmark to the "large-scale" TeraSort benchmark suite because MRBench
> checks whether small job runs are responsive and running efficiently on
> your cluster. It puts its focus on the MapReduce layer as its impact on
> the HDFS layer is very limited.
>
> This test should be run from a single box (see caveat below). The
> command syntax can be displayed via mrbench --help:
>
> MRBenchmark.0.0.2
> Usage: mrbench [-baseDir ]
>             [-jar ]
>             [-numRuns ]
>             [-maps ]
>             [-reduces ]
>             [-inputLines ]
>             [-inputType ]
>             [-verbose]
>
>       Important note: In Hadoop 0.20.2, setting the -baseDir parameter
> has no effect. This means that multiple parallel MRBench runs (e.g.
> started from different boxes) might interfere with each other. This is a
> known bug (MAPREDUCE-2398). I have submitted a patch but it has not been
> integrated yet.
>
> In Hadoop 0.20.2, the parameters default to:
>
> -baseDir: /benchmarks/MRBench  [*** see my note above ***]
> -numRuns: 1
> -maps: 2
> -reduces: 1
> -inputLines: 1
> -inputType: ascending
>
> The command to run a loop of 50 small test jobs is:
>
> $ hadoop jar hadoop-*-test.jar mrbench -numRuns 50
>
> Exemplary output of the above command:
>
> DataLines       Maps    Reduces AvgTime (milliseconds)
> 1               2       1       31414
>
> This means that the average finish time of executed jobs was 31 seconds.
>
> Can you check this?
> http://www.slideshare.net/ydn/ahis2011-platform-hadoop-simulation-and-performance
> http://issues.apache.org/jira/browse/HADOOP-5867
>
> Did you search on the current documentation of the API?
>
> Regards
>
>    
Ok, I understand.
Let me try to help you, because I'm a newie on the Hadoop ecosystem.
Tom White on its answer to this topic on the OReilly Answers's Site does 
a introduction to this:

The following command writes 10 files of 1,000 MB each:

|%|  *|hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -write -nrFiles 10|*

*|-fileSize 1000|*

At the end of the run, the results are written to the console and also 
recorded in a local file (which is appended to, so you can rerun the 
benchmark and not lose old results):

|%|  *|cat TestDFSIO_results.log|*

----- TestDFSIO ----- : write

            Date&  time: Sun Apr 12 07:14:09 EDT 2009

        Number of files: 10

Total MBytes processed: 10000

      Throughput mb/sec: 7.796340865378244

Average IO rate mb/sec: 7.8862199783325195

  IO rate std deviation: 0.9101254683525547

     Test exec time sec: 163.387

The files are written under the |/benchmarks/TestDFSIO| directory by 
default (this can be changed by setting the |test.build.data| system 
property), in a directory called |io_data|.

To run a read benchmark, use the |-read| argument. Note that these files 
must already exist (having been written by |TestDFSIO -write|):

|%|  *|hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -read -nrFiles 10|*

*|-fileSize 1000|*

Here are the results for a real run:

----- TestDFSIO ----- : read

            Date&  time: Sun Apr 12 07:24:28 EDT 2009

        Number of files: 10

Total MBytes processed: 10000

      Throughput mb/sec: 80.25553361904304

Average IO rate mb/sec: 98.6801528930664

  IO rate std deviation: 36.63507598174921

-----------------------------------------

     Test exec time sec: 47.624

When you’ve finished benchmarking, you can delete all the generated 
files from HDFS using the |-clean| argument:

|%|  *|hadoop jar $HADOOP_INSTALL/hadoop-*-test.jar TestDFSIO -clean|*


You can see that all results are written to the *|TestDFSIO_results.log.

So, you can begin to experiment with this.
  You can continue this reading on the Chapter 9 of the Hadoop: The 
Definitive Guide 2nd Edition, on the topic: Benchmarking a Hadoop Cluster.

In it, Tom gives several advices to benchmark  a Hadoop Cluster:

- Use a cluster that is not been used by others
- One of the primary test that one should do is a intensive I/O 
benchmark, to prove the cluster before it goes live to production
- Write benchmarks with Gridmix (Check this 
http://developer.yahoo.net/blogs/hadoop/2010/04/gridmix3_emulating_production.html)


Well, I hope that this information could help you. Remember, I've worked 
with Hadoop only for 1 year, so, you can ask for advices to others 
colleagues too.

Regards|*

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
  University of Information Sciences,
  La Habana, Cuba
  Linux User # 418229
  http://about.me/marcosortiz


Mime
View raw message