hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-4142) Synthetic Load Generator for NameNode testing -- Next Generation
Date Wed, 10 Sep 2008 16:39:44 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Chansler updated HADOOP-4142:
------------------------------------

    Description: 
A review of the Synthetic Load Generator identified several candidates for improvement. None
are so urgent as to stand in the way of the present facility, but all are of sufficient interest
to merit inclusion in this note.


  was:
A review of the Synthetic Load Generator identified several candidates for improvement. None
are so urgent as to stand in the way of the present facility, but all are of sufficient interest
to merit inclusion in this note.

* The SLG should model the proper number of connections to the NameNode. This might be accomplished
by using different user identities for each connection.

* There may be more appropriate statistical distributions that could have been chosen. Should
requests have Poisson statistics? Is a log-normal distribution more important for files sizes?
With a special case of zero-length files?

* "Intensity" might be a more convenient expression of rate parameters (events/second)

* Does a static initial name space topology bias results?

* Does the pairing of create with delete bias results?

* Should there be a correlation between listing a directory and reading its files?

* To the extent possible, tests should be repeatable. This can't be accomplished in an absolute
sense, but at least the behavior of the SLG should be as reproducible as possible. A reasonable
default for simulation applications is that each sequence of events be generated independently,
and that the seeds for each generator come from another reproducible sequence.

* Many of the questions of event distribution could evaded if the SLG could replay a sequence
of events from live log. Maybe there should be published a log that represents one standard
Hadoop of load.


* The SLG should model the proper number of connections to the NameNode. This might be accomplished
by using different user identities for each connection.

* There may be more appropriate statistical distributions that could have been chosen. Should
requests have Poisson statistics? Is a log-normal distribution more important for files sizes?
With a special case of zero-length files?

* "Intensity" might be a more convenient expression of rate parameters (events/second)

* Does a static initial name space topology bias results?

* Does the pairing of create with delete bias results?

* Should there be a correlation between listing a directory and reading its files?

* To the extent possible, tests should be repeatable. This can't be accomplished in an absolute
sense, but at least the behavior of the SLG should be as reproducible as possible. A reasonable
default for simulation applications is that each sequence of events be generated independently,
and that the seeds for each generator come from another reproducible sequence.

* Many of the questions of event distribution could evaded if the SLG could replay a sequence
of events from live log. Maybe there should be published a log that represents one standard
Hadoop of load.

> Synthetic Load Generator for NameNode testing -- Next Generation
> ----------------------------------------------------------------
>
>                 Key: HADOOP-4142
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4142
>             Project: Hadoop Core
>          Issue Type: Test
>          Components: dfs
>            Reporter: Robert Chansler
>
> A review of the Synthetic Load Generator identified several candidates for improvement.
None are so urgent as to stand in the way of the present facility, but all are of sufficient
interest to merit inclusion in this note.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message