hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Magalhaes <pedror...@gmail.com>
Subject CompositeInputFormat
Date Mon, 04 Aug 2014 20:36:55 GMT
I saw that one of the requirements to use CompositeInputFormat is:
"A map-side join can be used to join the outputs of several jobs that had
the same number of reducers, the same keys, and *output files that are not
splittable (by being smaller than an HDFS block, or by virtue of being gzip
compressed, for example)*"

So Does my partitions size must be equal or smaller than the HDFS Block?

If i have a 1 GB File = 1024 mb, i will have 16 partitions of 64 MB?

How can i control the size of the partition?

Mime
View raw message