hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Scherer <matthias.sche...@1und1.de>
Subject AW: How to process only input files containing 100% valid rows
Date Fri, 19 Apr 2013 09:39:12 GMT
I have to add that we have 1-2 Billion of Events per day, split to some thousands of files.
So pre-reading each file in the InputFormat should be avoided.

And yes, we could use MultipleOutputs and write bad files to process each input file. But
we (our Operations team) think that there is more / better control if we reject whole files
containing bad records.

Regards
Matthias

Mime
View raw message