hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Re: Map output files and partitions.
Date Fri, 14 Dec 2012 07:58:10 GMT
Hello Pedro,

       The first part of your question is very well covered by Harsh.

For the second part, the generation and no. of partitions is governed by
the getPartition() Method present in the 'Partition' Interface. The default
behavior is to create partitions based on Hashing. You can have your own
implementation of getPartion() to write your custom Partitioner.

HTH

Regards,
    Mohammad Tariq



On Fri, Dec 14, 2012 at 12:59 PM, Harsh J <harsh@cloudera.com> wrote:

> Map output files, by which you perhaps mean intermediate data files
> for temporary K/V persistence, are stored in IFiles. They do not use
> text nor sequence files (historically though, they did use sequence
> files at some point).
>
> You can read the IFile's sources at
>
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
> for more technical details on it. It is very similar to SequenceFiles
> in some ways.
>
> On Fri, Dec 14, 2012 at 12:45 PM, Pedro Sá da Costa <psdc1978@gmail.com>
> wrote:
> > Hi,
> >
> > There only 2 types of map output files, Sequence and Text files. If
> > those files are going to be used as input to several reduce tasks,
> > they need to be partitioned into blocks. Is there any SEPARATOR bits
> > that limits each partition? Can I read a specific partition of a map
> > output file? Is there an API for that?
> >
> > --
> > Best regards,
>
>
>
> --
> Harsh J
>

Mime
View raw message