hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4142) Synthetic Load Generator for NameNode testing -- Next Generation
Date Wed, 10 Sep 2008 01:21:44 GMT
Synthetic Load Generator for NameNode testing -- Next Generation
----------------------------------------------------------------

                 Key: HADOOP-4142
                 URL: https://issues.apache.org/jira/browse/HADOOP-4142
             Project: Hadoop Core
          Issue Type: Test
          Components: dfs
            Reporter: Robert Chansler


A review of the Synthetic Load Generator identified several candidates for improvement. None
are so urgent as to stand in the way of the present facility, but all are of sufficient interest
to merit inclusion in this note.

* The SLG should model the proper number of connections to the NameNode. This might be accomplished
by using different user identities for each connection.

* There may be more appropriate statistical distributions that could have been chosen. Should
requests have Poisson statistics? Is a log-normal distribution more important for files sizes?
With a special case of zero-length files?

* "Intensity" might be a more convenient expression of rate parameters (events/second)

* Does a static initial name space topology bias results?

* Does the pairing of create with delete bias results?

* Should there be a correlation between listing a directory and reading its files?

* To the extent possible, tests should be repeatable. This can't be accomplished in an absolute
sense, but at least the behavior of the SLG should be as reproducible as possible. A reasonable
default for simulation applications is that each sequence of events be generated independently,
and that the seeds for each generator come from another reproducible sequence.

* Many of the questions of event distribution could evaded if the SLG could replay a sequence
of events from live log. Maybe there should be published a log that represents one standard
Hadoop of load.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message