hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ranjit Mathew (JIRA)" <j...@apache.org>
Subject [jira] Created: (MAPREDUCE-2112) Create a Common Data-Generator for Testing Hadoop
Date Tue, 05 Oct 2010 11:42:33 GMT
Create a Common Data-Generator for Testing Hadoop

                 Key: MAPREDUCE-2112
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2112
             Project: Hadoop Map/Reduce
          Issue Type: New Feature
            Reporter: Ranjit Mathew
            Priority: Minor

It is useful to have a common data-generator for testing Hadoop and related projects. Such
a tool
should be able to generate data in a specified format and should be able to use a Hadoop cluster
for speeding up the data-generation. This tool can then be used across Hadoop (e.g. GridMix3),
Pig, Hive, etc. reducing the need for each project to invent something like this itself.

We can use the data-generator used in PigMix2 (PIG-200) as a starting point. It is described
in [http://wiki.apache.org/pig/DataGeneratorHadoop]. Since it depends on the SDSU
Java library ([http://www.eli.sdsu.edu/java-SDSU/]) released under the GNU GPL, it has to
modified a bit to eliminate this dependency before it can be included in Apache Hadoop.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message