hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Understanding the MapOutput
Date Fri, 04 Nov 2011 16:46:12 GMT
Hi Pedro,

The format is called IFile. Check out the source for more info on the
format - it's fairly simple. The partition starts are recorded in a
separate index file next to the output file.

I don't think you'll find significant docs on this format since it's
MR-internal - the code is your best resource.

-Todd

On Fri, Nov 4, 2011 at 8:37 AM, Pedro Costa <psdc1978@gmail.com> wrote:
> Hi,
>
> I'm trying to understand the structure of the map output file. Here's an
> example of a mapoutput file that contains 2 partitions:
>
> [code]
> <FF><FF><FF><FF>^@^@716banana banana apple banana carrot carrot
apple
> banana 0apple carrot carrot carrot banana carrot carrot 5^N4carrot apple
> carrot apple apple carrot banana apple ^Mbanana apple <FF><FF><DF>|<8E><B7>
> [/code]
>
> 1 - I would like to understand what are the ASCII characters parts. What
> they means?
>
> 2 - What type of file is a map output? Is it a SequenceFileOutputFormat, or
> a TextOutputFormat?
>
> 3 - I've a small program that runs independently of the MR that has the
> goal to digest each partition and give the correspondent hash. How do I
> know where each partition starts?
>
>
> --
> Thanks,
> PSC
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message