hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Understanding MapReduce source code : Flush operations
Date Tue, 07 Jan 2014 05:21:09 GMT

What OutputFormat are you using?

Once it reaches OutputFormat (specifically RecordWriter) it all depends on what the RecordWriter
does. Are you using some OutputFormat with a RecordWriter that buffers like this?

Thanks,
+Vinod

On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:

> This is not in DFSClient.
> 
> Before the output is written on to HDFS, lot of operations take place.
> 
> Like reducer output in mem reaching 90% of HDFS block size, then starting to flush  the
data etc..,
> 
> So, my requirement is to have a look at that code where in I want to change the logic
a bit which suits my convenience.
> 
> 
> On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli <vinodkv@hortonworks.com>
wrote:
> Assuming your output is going to HDFS, you want to look at DFSClient.
> 
> Reducer uses FileSystem to write the output. You need to start looking at how DFSClient
chunks the output and sends them across to the remote data-nodes.
> 
> Thanks
> +Vinod
> 
> On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:
> 
>> I want to have a look at the code where of flush operations that happens after the
reduce phase.
>> 
>> Reducer writes the output to OutputFormat which inturn pushes that to memory and
once it reaches 90% of chunk size it starts to flush the reducer output. 
>> 
>> I essentially want to look at the code of that flushing operation.
>> 
>> 
>> What is the class(es) I need to look into 
>> 
>> 
>> On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya <smarty.juice@gmail.com> wrote:
>> Please do not tell me since last 2.5 years you have not used virtual Hadoop environment
to debug your Map Reduce application before deploying to Production environment
>> 
>> No one can stop you looking at the code , Hadoop and its ecosystem is open-source
>> 
>> 
>> On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
wrote:
>> 
>> 
>> ---------- Forwarded message ----------
>> From: nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
>> Date: Mon, Jan 6, 2014 at 6:39 PM
>> Subject: Understanding MapReduce source code : Flush operations
>> To: mapreduce-user@hadoop.apache.org
>> 
>> 
>> Hi,
>> 
>> I am using hadoop/ map reduce for aout 2.5 years. I want to understand the internals
of the hadoop source code. 
>> 
>> Let me put my requirement very clear.
>> 
>> I want to have a look at the code where of flush operations that happens after the
reduce phase.
>> 
>> Reducer writes the output to OutputFormat which inturn pushes that to memory and
once it reaches 90% of chunk size it starts to flush the reducer output. 
>> 
>> I essentially want to look at the code of that flushing operation.
>> 
>> 
>> 
>> 
>> Regards,
>> Nagarjuna K
>> 
>> 
>> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it
is addressed and may contain information that is confidential, privileged and exempt from
disclosure under applicable law. If the reader of this message is not the intended recipient,
you are hereby notified that any printing, copying, dissemination, distribution, disclosure
or forwarding of this communication is strictly prohibited. If you have received this communication
in error, please contact the sender immediately and delete it from your system. Thank You.
> 


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message