hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua Harlow (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-708) A stress-test tool for HDFS.
Date Thu, 01 Apr 2010 18:45:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852501#action_12852501

Joshua Harlow commented on HDFS-708:

Looks good to me as well.
Just a couple thoughts/questions.

1. Would it be correct to have a "create" set of jobs job that would ensure before reads/deletes/writes..
that the files exist (instead of generating in a previous job)? That way the data is created
on demand, instead of needing to have a separate job that runs beforehand that just does data
population (this stage would not affect the overall timing allotted and could be done at the
start of the testing)?
2. It would probably be useful to add in a seed number so that the tests can be "mostly" repeated
(ie write and deletes can't really be truly repeated since they modify underlying storage)?
3. Might it be useful to add in the future the ability to specify your own distribution "objects"
that "generate" operation objects so that the current set of operations can be expanded without
core changes, ie a plugin like framework for generating the distribution and for generating
the actual set of operations that will occur (allowing for something like a AppendReadDelete
operation or similar which will be created distributed according to a square wave as an example)?

> A stress-test tool for HDFS.
> ----------------------------
>                 Key: HDFS-708
>                 URL: https://issues.apache.org/jira/browse/HDFS-708
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: test, tools
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.22.0
>         Attachments: SLiveTest.pdf
> It would be good to have a tool for automatic stress testing HDFS, which would provide
IO-intensive load on HDFS cluster.
> The idea is to start the tool, let it run overnight, and then be able to analyze possible

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message