hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5452) Relax the strict type check by allowing subclasses pass the check
Date Tue, 10 Mar 2009 16:36:51 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680528#action_12680528
] 

Hong Tang commented on HADOOP-5452:
-----------------------------------

I suspect this restriction is provided for performance reasons. To deserialize an object in
SequenceFile Reader, the SequenceFile needs to know the concrete type of the serialized bytes.
In other words, if objects of any sub-cloasses of the Key-class are admissible, then SequenceFile
may have to pay a per-key or per-value string to record the actual type of the key or value
objects.

Typically, you would have to write a wrapper class over the set of possible key types and
a numeric tag. The serialized form of your wrapper object is simply the numeric tag followed
by the actual object in serialized form. This effectively is to minimize the  per-key or per-value
overhead by using small integers instead of long strings.

> Relax the strict type check by allowing subclasses pass the check
> -----------------------------------------------------------------
>
>                 Key: HADOOP-5452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5452
>             Project: Hadoop Core
>          Issue Type: Improvement
>            Reporter: he yongqiang
>
> The type check like:
> {code}
> if (key.getClass() != keyClass)
>         throw new IOException("wrong key class: "+key.getClass().getName()
>                               +" is not "+keyClass);
> if (val.getClass() != valClass)
>         throw new IOException("wrong value class: "+val.getClass().getName()
>                               +" is not "+valClass);
> {code}
> is used a lot when a type check is needed. 
> I found their uses in org.apache.hadoop.io.SequenceFile, org.apache.hadoop.mapred.IFile,
org.apache.hadoop.mapred.MapTask. Because i search with(key.getClass() != keyClass), so these
codes may also appear in other classes.
> I suggest we can relax the strict type check by using 
> {code}
> if (key.getClass().isAssignableFrom(keyClass))
> {code}
> The error in my situation is listed below:
> {panel:borderStyle=dashed| borderColor=#ccc| titleBGColor=#F7D6C1| bgColor=#FFFFCE}
> java.io.IOException: Type mismatch in value from map: expected cn.ac.ict.vega.type.Type,
recieved cn.ac.ict.vega.type.Type$Float
> 	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:553)
> 	at cn.ac.ict.vega.parse.mapreduce.block.FilterColumnBlockMapper.map(FilterColumnBlockMapper.java:77)
> 	at cn.ac.ict.vega.parse.mapreduce.block.BlockMapRunner.run(BlockMapRunner.java:33)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:155)
> {panel} 
> Float is a sub class of Type. I wish it can pass the check. I use Type instead of Float
is because i can not determint exactly whether it is Float, String or  some others.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message