hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject How to generate a large dataset quickly.
Date Mon, 14 Apr 2014 11:50:45 GMT
I want to create a large dateset for HBase with different versions and
number of rows. It's about 10M rows and 100 versions to do some benchmarks.

What's the fastest way to create it?? I'm generating the dataset with a
Mapreduce of 100.000rows and 10verions. It takes 17minutes and size around
7Gb. I don't know if I could do it quickly. The bottleneck is when
MapReduces write the output and when transfer the output to the Reduces.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message