hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Twensky <jim.twen...@gmail.com>
Subject Hadoop TeraSort Generator
Date Fri, 24 Jul 2009 18:39:58 GMT

I'm doing some benchmarks on my cluster including the TeraSort
benchmark to test a couple of hardware characteristics. When I was
playing with Hadoop's generator, I found out that the keys generated
by Hadoop's TeraGen implementation are not the same as the official
generator located here: http://www.ordinal.com/try.cgi/gensort.tar.gz

Here are the first 5 keys generated by Hadoop:

Whereas the keys generated by the official generator are:

ASCII keys:         Binary keys:
---------------------    ---------------------
AsfAGHM5om    JimGrayRIP
~sHd0jDv6X        àäb³íþG
uI^EYm8s=|        ESÛíS)6\
Q)JN)R9z-L        *Ã6+`v_
o4FoBkqERn      \«8®Rb×        --> (note: that some binary keys are
negative and so not printable as a char)
--------------------- ---------------------

I was wondering if Hadoop's generator is based on the official
generaor exactly or is this just a similar implementation producing
different results. Can I be a displaying the results incorrectly? Here
is how I display them:

private void printKey(Text key) {
   byte[]  keyBytes = key.getBytes();
   for(int i=0; i<10; i++)


View raw message