hadoop-hdfs-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From a..@apache.org
Subject svn commit: r1440245 [2/2] - in /hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src: main/docs/src/documentation/content/xdocs/ site/apt/
Date Wed, 30 Jan 2013 01:52:15 GMT
Added: hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm?rev=1440245&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm (added)
+++ hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/SLGUserGuide.apt.vm Wed
Jan 30 01:52:14 2013
@@ -0,0 +1,195 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Synthetic Load Generator Guide
+  ---
+  ---
+  ${maven.build.timestamp}
+
+Synthetic Load Generator Guide
+
+%{toc|section=1|fromDepth=0}
+
+* Overview
+
+   The synthetic load generator (SLG) is a tool for testing NameNode
+   behavior under different client loads. The user can generate different
+   mixes of read, write, and list requests by specifying the probabilities
+   of read and write. The user controls the intensity of the load by
+   adjusting parameters for the number of worker threads and the delay
+   between operations. While load generators are running, the user can
+   profile and monitor the running of the NameNode. When a load generator
+   exits, it prints some NameNode statistics like the average execution
+   time of each kind of operation and the NameNode throughput.
+
+* Synopsis
+
+   The synopsis of the command is:
+
+----
+    java LoadGenerator [options]
+----
+
+   Options include:
+
+     * <<<-readProbability>>> <read probability>
+
+       The probability of the read operation; default is 0.3333.
+
+     * <<<-writeProbability>>> <write probability>
+
+       The probability of the write operations; default is 0.3333.
+
+     * <<<-root>>> <test space root>
+
+       The root of the test space; default is /testLoadSpace.
+
+     * <<<-maxDelayBetweenOps>>> <maxDelayBetweenOpsInMillis>
+
+       The maximum delay between two consecutive operations in a thread;
+       default is 0 indicating no delay.
+
+     * <<<-numOfThreads>>> <numOfThreads>
+
+       The number of threads to spawn; default is 200.
+
+     * <<<-elapsedTime>>> <elapsedTimeInSecs>
+
+       The number of seconds that the program will run; A value of zero
+       indicates that the program runs forever. The default value is 0.
+
+     * <<<-startTime>>> <startTimeInMillis>
+
+       The time that all worker threads start to run. By default it is 10
+       seconds after the main program starts running.This creates a
+       barrier if more than one load generator is running.
+
+     * <<<-seed>>> <seed>
+
+       The random generator seed for repeating requests to NameNode when
+       running with a single thread; default is the current time.
+
+   After command line argument parsing, the load generator traverses the
+   test space and builds a table of all directories and another table of
+   all files in the test space. It then waits until the start time to
+   spawn the number of worker threads as specified by the user. Each
+   thread sends a stream of requests to NameNode. At each iteration, it
+   first decides if it is going to read a file, create a file, or list a
+   directory following the read and write probabilities specified by the
+   user. The listing probability is equal to 1-read probability-write
+   probability. When reading, it randomly picks a file in the test space
+   and reads the entire file. When writing, it randomly picks a directory
+   in the test space and creates a file there.
+
+   To avoid two threads with the same load generator or from two different
+   load generators creating the same file, the file name consists of the
+   current machine's host name and the thread id. The length of the file
+   follows Gaussian distribution with an average size of 2 blocks and the
+   standard deviation of 1. The new file is filled with byte 'a'. To avoid
+   the test space growing indefinitely, the file is deleted immediately
+   after the file creation completes. While listing, it randomly picks a
+   directory in the test space and lists its content.
+
+   After an operation completes, the thread pauses for a random amount of
+   time in the range of [0, maxDelayBetweenOps] if the specified maximum
+   delay is not zero. All threads are stopped when the specified elapsed
+   time is passed. Before exiting, the program prints the average
+   execution for each kind of NameNode operations, and the number of
+   requests served by the NameNode per second.
+
+* Test Space Population
+
+   The user needs to populate a test space before running a load
+   generator. The structure generator generates a random test space
+   structure and the data generator creates the files and directories of
+   the test space in Hadoop distributed file system.
+
+** Structure Generator
+
+   This tool generates a random namespace structure with the following
+   constraints:
+
+    [[1]] The number of subdirectories that a directory can have is a random
+       number in [minWidth, maxWidth].
+
+    [[2]] The maximum depth of each subdirectory is a random number
+       [2*maxDepth/3, maxDepth].
+
+    [[3]] Files are randomly placed in leaf directories. The size of each
+       file follows Gaussian distribution with an average size of 1 block
+       and a standard deviation of 1.
+
+   The generated namespace structure is described by two files in the
+   output directory. Each line of the first file contains the full name of
+   a leaf directory. Each line of the second file contains the full name
+   of a file and its size, separated by a blank.
+
+   The synopsis of the command is:
+
+----
+    java StructureGenerator [options]
+----
+
+   Options include:
+
+     * <<<-maxDepth>>> <maxDepth>
+
+       Maximum depth of the directory tree; default is 5.
+
+     * <<<-minWidth>>> <minWidth>
+
+       Minimum number of subdirectories per directories; default is 1.
+
+     * <<<-maxWidth>>> <maxWidth>
+
+       Maximum number of subdirectories per directories; default is 5.
+
+     * <<<-numOfFiles>>> <#OfFiles>
+
+       The total number of files in the test space; default is 10.
+
+     * <<<-avgFileSize>>> <avgFileSizeInBlocks>
+
+       Average size of blocks; default is 1.
+
+     * <<<-outDir>>> <outDir>
+
+       Output directory; default is the current directory.
+
+     * <<<-seed>>> <seed>
+
+       Random number generator seed; default is the current time.
+
+** Data Generator
+
+   This tool reads the directory structure and file structure from the
+   input directory and creates the namespace in Hadoop distributed file
+   system. All files are filled with byte 'a'.
+
+   The synopsis of the command is:
+
+----
+    java DataGenerator [options]
+----
+
+   Options include:
+
+     * <<<-inDir>>> <inDir>
+
+       Input directory name where directory/file structures are stored;
+       default is the current directory.
+
+     * <<<-root>>> <test space root>
+
+       The name of the root directory which the new namespace is going to
+       be placed under; default is "/testLoadSpace".



Mime
View raw message