hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjit Mathew <ran...@yahoo-inc.com>
Subject Re: Data for Testing in Hadoop
Date Thu, 06 Jan 2011 05:53:44 GMT
On Tuesday 04 January 2011 01:01 PM, Adarsh Sharma wrote:
> For this, I require some data for testing. Would anyone send me some
> links for data of different sizes ( 10Gb, 20GB, 30 Gb , 50GB ) .
> I shall be grateful for this kindness.

If you just want random data of a specific size, you can use "dd" on
Linux with the /dev/urandom pseudo-file. For example, to generate 10 MiB
of random data:

   dd if=/dev/urandom of=data.bin bs=1024 count=10240

For more structured and "Hadoop-enabled" random data-generation, you
can use the data-generator from PigMix2:

   http://wiki.apache.org/pig/DataGeneratorHadoop
   https://issues.apache.org/jira/browse/PIG-200

HTH,
Ranjit

Mime
View raw message