hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-708) A stress-test tool for HDFS.
Date Fri, 16 Apr 2010 22:31:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12858038#action_12858038

Konstantin Shvachko commented on HDFS-708:

Joshua asked what random file generation mean, as per this sentence from the design doc:
2. Randomly chooses a file name. File names are enumerated, so choosing a file means choosing
its sequence number, which defines the entire file path.

I mean by this that we have a static enumeration of files. We choose a random number, and
then calculate a full path for the corresponding file using that number.
The static enumeration is like a heap structure. We have an array f0, f1, f2, ... There is
a root r. The root's children are files f0 and f1. And two directories d0 and d1. The children
of d0 are the files f2, f3 (and the directories d2, d3). The children of d1 are the files
f4, f5 as well as the directories d4, d5. And so on. This provides 2 files per directory.

We can generalize it to p files per directory for a fixed p. Here the root's children will
be p files f0,...,f(p-1) and p directories d0,...,d(p-1). And so on. Importantly if you have
a file fz, then it's parent is always the directory dz', where z' = z/p - 1.
I don't want to use long numbers for file names. So within a directory its child files are
named {{file_i}} and sub-directories are named {{dir_i}} for i = 0,...p-1.
Then given a number z the path of file fz is calculateed recursively. File name of fz is {{file_(z%p)}}.
Its parent is the directory dz', where z' = z/p - 1, and the name of dz' is {{dir_(z'%p)}}.
Going further up the tree while the the indexes are positive.

In the test we choose a random z and build a path out of it. If the operation is create we
create a file with this path. In HDFS all missing directories along the path will be created
automatically. If fz already exists the create fails. 
For read we do the same, but the operation fails if the file does not exist.

Similar approach is used in class {{FileNameGenerator}}. 

> A stress-test tool for HDFS.
> ----------------------------
>                 Key: HDFS-708
>                 URL: https://issues.apache.org/jira/browse/HDFS-708
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: test, tools
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.22.0
>         Attachments: SLiveTest.pdf
> It would be good to have a tool for automatic stress testing HDFS, which would provide
IO-intensive load on HDFS cluster.
> The idea is to start the tool, let it run overnight, and then be able to analyze possible

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message