hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Filipiak <Jan.Filip...@trivago.com>
Subject Hdfs FSShell getmerge feature idea when using -nl
Date Fri, 24 Jul 2015 14:52:48 GMT
Hello hadoop users,

I have an idea about a small feature for the getmerge tool. I recently
was in the need of using the new line option -nl because the files I
needed to merge simply didn't had one.
I was merging all the files from one directory and unfortunately this
directory also included empty files, which effectively led to multiple
newlines append after some files.
I needed to remove them manually afterwards.

In this situation it is maybe good to have another argument that allows
skipping empty files. I just wrote down 2 change one could try at the
end. Do you guys consider this as a good improvement to the command line

Thing one could try to implement this feature:

The call for IOUtils.copyBytes(in, out, getConf(), false); doesn't
return the number of bytes copied which would be convenient as one could
skip append the new line when 0 bytes where copied
Or one would check the file size before.

Please let me know If you would consider this useful and is worth a
feature ticket in Jira.

Thank you

View raw message