hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdul Navaz <navaz....@gmail.com>
Subject Re: Where the output of mappers are saved ?
Date Tue, 16 Dec 2014 16:26:01 GMT
Hello,

As of hadoop documentation, mapper output is not saved in HDFS. It will be
saved in temporary local disk which we can modify using mapred.local.dir in
mapred.site.xml file. I was able to see the mapper output in this directory
and once the job is done it flushes out the data from this temporary
directory as it is not needed anymore.
My question was how to get the size of this mapper output. Anyhow I figured
it now, When running map reduce job execute “du –ah” will give me the size
of all directories including subdirectories.


Thanks & Regards,

Abdul Navaz
Research Assistant
University of Houston Main Campus, Houston TX



From:  "bit1129@163.com" <bit1129@163.com>
Reply-To:  <user@hadoop.apache.org>
Date:  Tuesday, December 16, 2014 at 2:12 AM
To:  user <user@hadoop.apache.org>
Subject:  Re: Re: Where the output of mappers are saved ?

Thanks Susheel !, understood.


bit1129@163.com
>  
> From: Susheel Kumar Gadalay <mailto:skgadalay@gmail.com>
> Date: 2014-12-16 15:27
> To: user <mailto:user@hadoop.apache.org>
> Subject: Re: Re: Where the output of mappers are saved ?
> I don't think so. It will be a single output file per reducer.
>  
> If u want multiple small size output files then specify the number of
> reducers in the job configuration.
>  
> On 12/16/14, bit1129@163.com <bit1129@163.com> wrote:
>> > Thanks Susheel!!
>> > One more question.. If  part-r-XXXX is extremely large,say, 2G, will the
>> > file be splitted into more files under the output directory,that is, one
>> > reducer could product more than one files.
>> >
>> >
>> >
>> > bit1129@163.com
>> >
>> > From: Susheel Kumar Gadalay
>> > Date: 2014-12-16 14:17
>> > To: user
>> > Subject: Re: Re: Where the output of mappers are saved ?
>> > Yes, the map outputs will be cleaned on job completion.
>> >
>> > If u want to see the map outputs give number of reducers as zero
>> > and verify the files part-m-0000, part-m-0001....
>> >
>> > On 12/16/14, bit1129@163.com <bit1129@163.com> wrote:
>>> >> Do they only exist during the map/reduce process and will be removed
>>> >> after
>>> >> the MR finished?
>>> >>
>>> >> When the reduce finished,I only see  part-m-0000, part-m-0001 ....,
which
>>> >> are reduce results.
>>> >>
>>> >>
>>> >>
>>> >> bit1129@163.com
>>> >>
>>> >> From: Susheel Kumar Gadalay
>>> >> Date: 2014-12-16 13:05
>>> >> To: user
>>> >> Subject: Re: Where the output of mappers are saved ?
>>> >> Map outputs will be in hdfs under your user name and output directory.
>>> >>
>>> >> They will have name like part-m-0000, part-m-0001 ....
>>> >>
>>> >>
>>> >> On 12/16/14, Abdul Navaz <navaz.enc@gmail.com> wrote:
>>>> >>> Hello,
>>>> >>>
>>>> >>>
>>>> >>> Second Try !
>>>> >>>
>>>> >>>
>>>> >>> I  have created a directory to store this mapper output as below.
>>>> >>>  <property>
>>>> >>>  <name>mapred.local.dir</name>
>>>> >>>  <value>/app/hadoop/tmp/myoutput</value>
>>>> >>>  </property>
>>>> >>> and i looked at
>>>> >>>  hduser@dn4:/app/hadoop/tmp/myoutput$ ls -lrt
>>>> >>>  total 16
>>>> >>>  drwxr-xr-x 2 hduser hadoop 4096 Dec 12 10:50 tt_log_tmp
>>>> >>>  drwx------ 3 hduser hadoop 4096 Dec 12 10:53 ttprivate
>>>> >>>  drwxr-xr-x 3 hduser hadoop 4096 Dec 12 10:53 taskTracker
>>>> >>>  drwxr-xr-x 4 hduser hadoop 4096 Dec 12 13:25 userlogs
>>>> >>> and i couldnot find anything here when i run the map reduce
job . Where
>>>> >>> by
>>>> >>> default mapper output is saved and how can I get the size of
mapper
>>>> >>> output
>>>> >>> in bytes
>>>> >>>
>>>> >>>
>>>> >>> Thanks.
>>>> >>>
>>>> >>>
>>>> >>> From:  Abdul Navaz <navaz.enc@gmail.com>
>>>> >>> Date:  Friday, December 12, 2014 at 12:36 AM
>>>> >>> To:  "user@hadoop.apache.org" <user@hadoop.apache.org>
>>>> >>> Subject:  Where the output of mappers are saved ?
>>>> >>>
>>>> >>> Hello,
>>>> >>>
>>>> >>>
>>>> >>> I am interested in efficiently manage the Hadoop shuffling traffic
and
>>>> >>> utilize the network bandwidth effectively. To do this I want
to know
how
>>>> >>> much shuffling traffic generated by each Datanodes ? Shuffling
traffic
>>>> >>> is
>>>> >>> nothing but the output of mappers. So where this mapper output
is saved
>>>> >>> ?
>>>> >>> How can i get the size of mapper output from each datanodes
in a real
>>>> >>> time
>>>> >>> ?
>>>> >>> Appreciate your help.
>>>> >>>
>>>> >>> Thanks & Regards,
>>>> >>>
>>>> >>> Abdul Navaz
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>> >>
>> >



Mime
View raw message