hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: changes to compression interfaces in 0.15?
Date Thu, 21 Feb 2008 02:47:40 GMT

Actually, it might just be good to have a warning spit out if you use ANY
unknown key that starts with mapred.* or any of the other hadoop-specific
parameters.

That way mis-spellings would be caught as well as deprecations.

If you want to set a value and not get a warning, just pick a different
prefix.

On 2/20/08 6:13 PM, "Jason Venner" <jason@attributor.com> wrote:

> I agree. I am in the midst of combing through the config files for 16 to
> see what changes i have to retrofit into our jobs.
> Support in the tools to inform of the use of depreciated or outright
> removed keys would be wonderful.
> 
> Aaron Kimball wrote:
>> As a general follow-up suggestion : Is there a mechanism to output a
>> warning when the user sets deprecated JobConf keys? Given that you can
>> set any arbitrary key name and it will simply be ignored, this might
>> be a good idea.
>> 
>> - Aaron
>> 
>> Joydeep Sen Sarma wrote:
>>> In addition:
>>> 
>>> -          "mapred.output.compression.type" is now replaced with
>>> "mapred.map.output.compression.type"
>>> 
>>> -          the old implementation of the Java interface
>>> setMapOutputCompressorClass() used to turn on map compression on
>>> automatically as side-effect, the 0.15 one doesn't. Looks like one has
>>> to call setCompressMapOutput() separately.
>>> 
>>>  
>>> 
>>> Aargh.
>>> 
>>>  
>>> 
>>> ________________________________
>>> 
>>> From: hive-devel-bounces@lists.facebook.com
>>> [mailto:hive-devel-bounces@lists.facebook.com] On Behalf Of Joydeep Sen
>>> Sarma
>>> Sent: Wednesday, February 20, 2008 5:06 PM
>>> To: core-user@hadoop.apache.org
>>> Subject: changes to compression interfaces in 0.15?
>>> 
>>>  
>>> 
>>> Hi developers,
>>> 
>>>  
>>> 
>>> In migrating to 0.15 - i am noticing that the compression interfaces
>>> have changed:
>>> 
>>>  
>>> 
>>> -          compression type for sequencefile outputs used to be set by:
>>> SequenceFile.setCompressionType()
>>> 
>>> -          now it seems to be set using:
>>> sequenceFileOutputFormat.setOutputCompressionType()
>>> 
>>>  
>>> 
>>> The change is for the better - but would it be possible to:
>>> 
>>>  
>>> 
>>> -          remove old/dead interfaces. That would have been a
>>> straightforward hint for applications to look for new interfaces.
>>> (hadoop-default.xml also still has setting for old conf variable:
>>> io.seqfile.compression.type)
>>> 
>>> -          if possible - document changed interfaces in the release
>>> notes (there's no way we can find this out by looking at the long list
>>> of Jiras).
>>> 
>>>  
>>> 
>>> As u can imagine - this causes a very subtle and harmful regression in
>>> behavior of existing apps. It does not causes failures - and in our case
>>> - switched from BLOCK to RECORD compression - meaning - there's no
>>> compression at all pretty much. I caught this by *pure* chance and now I
>>> am living in absolute fear of what else lurks out there.
>>> 
>>>  
>>> 
>>> i am not sure how updated the wiki is on the compression stuff (my
>>> responsibility to update it) - but please do consider the impact of
>>> changing interfaces on existing applications. (maybe we should have a
>>> JIRA tag to mark out bugs that change interfaces).
>>> 
>>>  
>>> 
>>> As always - thanks for all the fish (err .. working code),
>>> 
>>>  
>>> 
>>> Joydeep
>>> 
>>>  
>>> 
>>> 


Mime
View raw message