hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sanjay Radia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1989) Add support for simulated Data Nodes - helpful for testing and performance benchmarking of the Name Node without having a large cluster
Date Fri, 16 Nov 2007 19:24:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543177
] 

Sanjay Radia commented on HADOOP-1989:
--------------------------------------

BTW the datanodecluster command will work for non-simulated data nodes - it creates multiple
datanodes in one JVM.
So if you had a large machine and wanted to run multiple datanodes (real not simulated) in
one JVM you could do this.
However one could argue that one may want to do this only for testing.

I will remove the datanodecluster  shortly.

> Add support for simulated Data Nodes  - helpful for testing and performance benchmarking
of the Name Node without having a large cluster
> ----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1989
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1989
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.16.0
>            Reporter: Sanjay Radia
>            Assignee: Sanjay Radia
>            Priority: Minor
>             Fix For: 0.16.0
>
>         Attachments: SimulatedStoragePatchSubmit.txt, SimulatedStoragePatchSubmit5.txt,
SimulatedStoragePatchSubmit6.txt, SimulatedStoragePatchSubmit7.txt, SimulatedStoragePatchSubmit8.txt
>
>
> Proposal is to add an implementation for a Simulated Data Node.
> This will 
>   - allow one to test certain parts of the system (especially the Name Node, protocols)
much more easily and efficiently.
>   - allow one to run performance benchmarks on the Name node without having a large cluster.
>   - Inject faults for testing (e.g. one can add random faults based probability parameters).
> The idea is that the Simulated Data Node will
>  - discard any data written to blocks (but remember the blocks and their sizes)
>  - generate fixed data on the fly when blocks are read (e.g. block is fixed set of bytes
or repeated sequence of strings).
> The Simulated Data Node can also be used for fault injection.
> The data node can be parameterized with probabilities that allow one to control:
>   - Delays on reads and writes, creates, etc
>   - IO Exceptions
>  - Loss of blocks 
>  - Failures

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message