hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pedro Magalhaes <pedror...@gmail.com>
Subject Re: CompositeInputFormat
Date Thu, 07 Aug 2014 21:04:09 GMT
Thanks for reply..

Really, what i  am doing is trying to implement a mapside join. In my mind,
i am gonna need that files must be no splittable, so each map will process
partitions with same key.

I saw in hadoop, definitive guide, that i can force files not be split,
setting the min split size to max integer. Other option is override the
method isSplittablable.

Is that make sense?

Sorry for the spelling mistakes. I am from my iphone..

Em quinta-feira, 7 de agosto de 2014, Chris Douglas <cdouglas@apache.org>

> Is that quote from product documentation?
> Whether the output files are splittable is a practical consideration
> when setting up the join; the quote is identifying a common case that
> satisfies the constraints. The size of each partition is irrelevant,
> provided that the splits are generated consistently across all
> InputFormats involved in the expression (i.e., given datasets A,B in a
> join expression and a key K in A, K is in partition N iff K is in
> partition N for InputFormat B OR K is not in B). -C
> On Mon, Aug 4, 2014 at 1:36 PM, Pedro Magalhaes <pedrorjbr@gmail.com
> <javascript:;>> wrote:
> > I saw that one of the requirements to use CompositeInputFormat is:
> > "A map-side join can be used to join the outputs of several jobs that had
> > the same number of reducers, the same keys, and output files that are not
> > splittable (by being smaller than an HDFS block, or by virtue of being
> gzip
> > compressed, for example)"
> >
> > So Does my partitions size must be equal or smaller than the HDFS Block?
> >
> > If i have a 1 GB File = 1024 mb, i will have 16 partitions of 64 MB?
> >
> > How can i control the size of the partition?
> >
> >

View raw message