I want to merge multiple files in one HDFS dir to one file. I am planning to write a map only job using input format which will create only one inputSplit per dir.
this way my job don't need to do any shuffle/sort.(only read and write back to disk)
Is there any such file format already implemented ?
Or any there better solution for the problem.