hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Vyas <jayunit...@gmail.com>
Subject Re: Sorting huge text files in Hadoop
Date Fri, 15 Feb 2013 18:09:26 GMT
i don't think you can't do an embarassingly parallel sort of a randomly
ordered file without merging results.

However, if you know that the file is psudeoordered:

10000123
10000232
10000000
19991019
20200222
301111111
30000000

Then you can (maybe) sort the individual blocks in mappers using some black
magic ...  but it would be very very ugly

better off simply running the mappers with the default reducer - they will
sort the file for you naturally :)

Mime
View raw message