Message view | « Date » · « Thread » |
---|---|
Top | « Date » · « Thread » |
From | Jay Vyas <jayunit...@gmail.com> |
Subject | Re: Sorting huge text files in Hadoop |
Date | Fri, 15 Feb 2013 18:09:26 GMT |
i don't think you can't do an embarassingly parallel sort of a randomly ordered file without merging results. However, if you know that the file is psudeoordered: 10000123 10000232 10000000 19991019 20200222 301111111 30000000 Then you can (maybe) sort the individual blocks in mappers using some black magic ... but it would be very very ugly better off simply running the mappers with the default reducer - they will sort the file for you naturally :) | |
Mime |
|
View raw message |