hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Sorting huge text files in Hadoop
Date Fri, 15 Feb 2013 19:11:49 GMT
Why not? 

Who said you had to parallelize anything?

On Feb 15, 2013, at 12:09 PM, Jay Vyas <jayunit100@gmail.com> wrote:

> i don't think you can't do an embarassingly parallel sort of a randomly ordered file
without merging results.  
> 
> However, if you know that the file is psudeoordered: 
> 
> 10000123
> 10000232
> 10000000
> 19991019
> 20200222
> 301111111
> 30000000
> 
> Then you can (maybe) sort the individual blocks in mappers using some black magic ...
 but it would be very very ugly 
> 
> better off simply running the mappers with the default reducer - they will sort the file
for you naturally :)
> 




Mime
View raw message