hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nagarjuna kanamarlapudi <nagarjuna.kanamarlap...@gmail.com>
Subject Re: Understanding MapReduce source code : Flush operations
Date Tue, 07 Jan 2014 06:15:04 GMT
I am using TextOutputFormat

Ok, the idea over here is , This output format writes to to a record
writer.. which in turn has to pass it on *some other object *where the data
is stored in mem and flushed once the block size is reached.

I want to look at that *some other object  *and other helper classes which
is writing/flushing the output to disk.

Regards,
Nagarjuna

On Tue, Jan 7, 2014 at 10:51 AM, Vinod Kumar Vavilapalli <
vinodkv@hortonworks.com> wrote:

>
> What OutputFormat are you using?
>
> Once it reaches OutputFormat (specifically RecordWriter) it all depends on
> what the RecordWriter does. Are you using some OutputFormat with a
> RecordWriter that buffers like this?
>
> Thanks,
> +Vinod
>
> On Jan 6, 2014, at 7:11 PM, nagarjuna kanamarlapudi <
> nagarjuna.kanamarlapudi@gmail.com> wrote:
>
> This is not in DFSClient.
>
> Before the output is written on to HDFS, lot of operations take place.
>
> Like reducer output in mem reaching 90% of HDFS block size, then starting
> to flush  the data etc..,
>
> So, my requirement is to have a look at that code where in I want to
> change the logic a bit which suits my convenience.
>
>
> On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli <
> vinodkv@hortonworks.com> wrote:
>
>> Assuming your output is going to HDFS, you want to look at DFSClient.
>>
>>  Reducer uses FileSystem to write the output. You need to start looking
>> at how DFSClient chunks the output and sends them across to the remote
>> data-nodes.
>>
>> Thanks
>> +Vinod
>>
>> On Jan 6, 2014, at 11:07 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>
>> I want to have a look at the code where of flush operations that happens
>> after the reduce phase.
>>
>> Reducer writes the output to OutputFormat which inturn pushes that to
>> memory and once it reaches 90% of chunk size it starts to flush the reducer
>> output.
>>
>> I essentially want to look at the code of that flushing operation.
>>
>>
>> What is the class(es) I need to look into
>>
>>
>> On Mon, Jan 6, 2014 at 11:23 PM, Hardik Pandya <smarty.juice@gmail.com>wrote:
>>
>>> Please do not tell me since last 2.5 years you have not used virtual
>>> Hadoop environment to debug your Map Reduce application before deploying to
>>> Production environment
>>>
>>> No one can stop you looking at the code , Hadoop and its ecosystem is
>>> open-source
>>>
>>>
>>> On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <
>>> nagarjuna.kanamarlapudi@gmail.com> wrote:
>>>
>>>>
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: nagarjuna kanamarlapudi <nagarjuna.kanamarlapudi@gmail.com>
>>>>  Date: Mon, Jan 6, 2014 at 6:39 PM
>>>> Subject: Understanding MapReduce source code : Flush operations
>>>> To: mapreduce-user@hadoop.apache.org
>>>>
>>>>
>>>>  Hi,
>>>>
>>>> I am using hadoop/ map reduce for aout 2.5 years. I want to understand
>>>> the internals of the hadoop source code.
>>>>
>>>> Let me put my requirement very clear.
>>>>
>>>> I want to have a look at the code where of flush operations that
>>>> happens after the reduce phase.
>>>>
>>>> Reducer writes the output to OutputFormat which inturn pushes that to
>>>> memory and once it reaches 90% of chunk size it starts to flush the reducer
>>>> output.
>>>>
>>>> I essentially want to look at the code of that flushing operation.
>>>>
>>>>
>>>>
>>>>
>>>> Regards,
>>>> Nagarjuna K
>>>>
>>>>
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

Mime
View raw message