hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject map-red with many input paths
Date Wed, 17 Oct 2012 00:25:08 GMT
currently i run a map-reduce job that reads from a single path with a glob:
i am considering replacing this one glob path with an explicit list of all
the paths (so that i can check for _SUCCESS files in the subdirs and
exclude the subdirs that don't have this file, to avoid reading from
subdirs as data is being written to them).
there are hundreds of subdirectories in /data, and it will be thousands
soon... is there a limit on how many paths i can include for a map-red job?
is there a smarter way to do this?
thanks! koert

View raw message