hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: intermediate results not getting compressed
Date Tue, 17 Mar 2009 17:15:18 GMT
Watching a second job with more reduce task running looks like the in-memory 
merges are working correctly with compression.

The task I was watching failed and was running again it Shuffle all the map 
output files then started the merged after all was copied so non was merged 
in memory it was closed before the merging started.
If it helps the name of the output files is intermediate.x and is stored in 
folder mapred/local/job-taskname/intermediate.x
while the in-memory merges are stored 
mapred/local/taskTracker/jobcache/job-name/taskname/

The non compressed ones are the intermediate.x file above.

Billy


"Chris Douglas" <chrisdo@yahoo-inc.com> wrote in 
message news:9BB78C3A-EFAB-45C3-8CC3-25AAB60DF914@yahoo-inc.com...
>> My problem is the output from merging the intermediate map output  files 
>> is not compresses so I lose all the benefit of compressing the  map file 
>> output to save disk space because the merged map output  files are no 
>> longer compressed.
>
> It should still be compressed, unless there's some bizarre regression. 
> More segments will be around simultaneously (since the segments not  yet 
> merged are still on disk), which clearly puts pressure on  intermediate 
> storage, but if the map outputs are compressed, then the  merged map 
> outputs at the reduce must also be compressed. There's no  place in the 
> intermediate format to store compression metadata, so  either all are or 
> none are. Intermediate merges should also follow the  compression spec of 
> the initiating merger, too (o.a.h.mapred.Merger: 447).
>
> How are you concluding that the intermediate output is compressed from 
> the map, but not in the reduce? -C
>
>>
>> ----- Original Message ----- From: "Chris Douglas" 
>> <chrisdo-ZXvpkYn067l8UrSeD/g0lQ@public.gmane.org
>> >
>> Newsgroups: gmane.comp.jakarta.lucene.hadoop.user
>> To: 
>> <core-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/JbrIOy@public.gmane.org>
>> Sent: Tuesday, March 17, 2009 12:33 AM
>> Subject: Re: intermediate results not getting compressed
>>
>>
>>>> I am running 0.19.1-dev, r744282. I have searched the issues but 
>>>> found nothing about the compression.
>>>
>>> AFAIK, there are no open issues that prevent intermediate  compression 
>>> from working. The following might be useful:
>>>
>>> http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#Data+Compression
>>>
>>>> Should the intermediate results not be compressed also if the map 
>>>> output files are set to be compressed?
>>>
>>> These are controlled by separate options.
>>>
>>> FileOutputFormat::setCompressOutput enables/disables compression  on 
>>> the final output
>>> JobConf::setCompressMapOutput enables/disables compression of the 
>>> intermediate output
>>>
>>>> If not then why do we have the map compression option just to save 
>>>> network traffic?
>>>
>>> That's part of it. Also to save on disk bandwidth and intermediate 
>>> space. -C
>>
>>
>
> 



Mime
View raw message