hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit100.apa...@gmail.com>
Subject Re: To Generate Test Data in HDFS (PDGF)
Date Mon, 22 Sep 2014 09:48:31 GMT
While on the subject,
You can also use the bigpetstore application to do this, in apache bigtop.  This data is suited
well for hbase ( semi structured, transactional, and features some global patterns which can
make for meaningful queries and so on).

Clone apache/bigtop
cd bigtop-bigpetstore
gradle clean package # build the jar

Then follow the instructions in the README to generate as many records as you want in a distributed
context.  Each record is around 80 bytes, so about 10^10 records should be on the scale you
are looking for.

> On Sep 22, 2014, at 5:14 AM, "Arthur.hk.chan@gmail.com" <arthur.hk.chan@gmail.com>
> Hi,
> I need to generate large amount of test data (4TB) into Hadoop, has anyone used PDGF
to do so? Could you share your cook book about PDGF in Hadoop (or HBase)? 
> Many Thanks
> Arthur

View raw message