hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Barry, Sean F" <sean.f.ba...@intel.com>
Subject Terasort
Date Mon, 14 May 2012 17:40:26 GMT
I am having a bit of trouble understanding how the Terasort benchmark works, especially the
fundamentals of how the data is sorted. If the data is being split into many chunks wouldn't
it all have to be re-integrated back into the entire dataset?

And since a terabyte is huge wouldn't it take a very long time. I seem to be missing a few
crucial steps in the process and if someone could help me understand how terasort is working
that would be great. Any papers or videos on this topic would be greatly appreciated.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message