hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Clarke <clarke...@gmail.com>
Subject Randomize input file?
Date Thu, 21 May 2009 14:18:58 GMT
Hi,

I have a need to randomize my input file before processing. I understand I
can chain Hadoop jobs together so the first could take the input file
randomize it and then the second could take the randomized file and do the
processing.

The input file has one entry per line and I want to mix up the lines before
the main processing.

Is there an inbuilt ability I have missed or will I have to try and write a
Hadoop program to shuffle my input file?

Cheers,
John

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message