hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-708) A stress-test tool for HDFS.
Date Thu, 29 Apr 2010 02:06:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862060#action_12862060

Konstantin Shvachko commented on HDFS-708:

This is great. The patch covers main functionality and worked on a cluster as we tested it.
Some comments:

# Consistently use annotations for overridden methods with the indication which class or interface
it overrides, like:
@Override // MapReduceBase
{{SliveMapper}} and {{SliveReducer}} don't have annotations
# In {{SliveMapper.map()}} if {{unlimitiedTime}} is equivalent to {{duration == max_int}}
then the code becomes simpler.
# {{ConfigExtractor}} methods like {{Integer getDuration(), Long getRandomSeed()}}, etc. should
return base types int, long. 
I don't see where returning objects is useful except that you have to check for null value,
which is not useful at all.
# Measuring op timeTaken includes noise. The timer is started at the beginning of {{op.run()}}
and includes time for parsing
configs, choosing parameters, creating output streams, and generation of data as I see for
write and append.
The best way would be to measure the performance of the operation by
# timeTaken for read should not include data verification time.
# {{MinMax<>}} class is more like a {{Range}}. If you decide to change it the {{getMinMax*()}}
should be also renamed.
# {{DataGenerator}} has import warning. Look for more.
# {{RandomInstance.betweenPositive()}} can be moved into the MinMax class, because you need
this to get a random
number in the range. After that it may make sense to move the instance of Random into Operation
base class, and 
get rid of RandomInstance class.
# {{OperationType}} enumerator should be moved into {{Constants}}.
# Same with {{Distribution}}.
# I recommend to merge all the classes under one package {{o.a.h.fs.slive}} rather than in
many sub-packages.
That way most classes may be declared package private to emphasize they are specific for this
# Also may be some small classes will merge into other.
# I just realized that although the issue is in HDFS the commit will have to go into MapReduce.
Let's keep tracking 
this in here and I'll create a MR issue when the commit is ready.
# There is one problem remaining in our design: that checksums don't work after file is renamed,
because the file name 
is mixed in the hash.

> A stress-test tool for HDFS.
> ----------------------------
>                 Key: HDFS-708
>                 URL: https://issues.apache.org/jira/browse/HDFS-708
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: test, tools
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>            Assignee: Joshua Harlow
>             Fix For: 0.22.0
>         Attachments: slive.patch, SLiveTest.pdf
> It would be good to have a tool for automatic stress testing HDFS, which would provide
IO-intensive load on HDFS cluster.
> The idea is to start the tool, let it run overnight, and then be able to analyze possible

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message