hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3380) need comparators in serializer framework
Date Wed, 14 May 2008 11:35:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596730#action_12596730

Enis Soztutar commented on HADOOP-3380:

With the introduction of serialization framework, the need for RawComparator is somewhat broken.

In theory an object of some type (for example Double) can be serialized to its byte[] form
in an arbitrary way by different serializers, so it is not possible to efficiently compare
two byte arrays w/o actually deserializing the objects. Although some objects, especially
writables, can precisely know how it is serialized and thus can benefit from raw byte comparison(in
short we should keep RawComparator) 
Similarly the returned RawComparators returned by Serialization#getComparator() cannot do
much except deserializing the objects and calling {{o1.compareTo(o2)}} (see {{DeserializerComparator}}
and {{JavaSerializationComparator}}). 

I think we should 
# not change Serialization interface 
# introduce DefaultComparator extending DeserializerComparator, implementing Configurable,
and with static {{register(Class, RawComparator)}} and {{get(Class)}} methods. 
DefaultComparator.get(Class keyClass) should check for registered Comparator instances for
a given class, if unsuccessful, it should return itself, obtaining Deserializer by calling
# replace usages of WritableComparator#define() with DefaultComparator#register(), 
# WritableComparator extends DefaultComparator
# fix JobConf#getOutputValueGroupingComparator(), so that it uses DefaultComparator. 
# depracate JavaSerializationComparator (since it is not needed once we have DefaultComparator
extending DeserializerComparator)

thoughts ? 

> need comparators in serializer framework
> ----------------------------------------
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
> The new serialization framework permits Hadoop to incorporate different serialization
systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic,
extensible means (SerializationFactory) to create serializers and deserializers for arbitrary
Java classes.  However it does not include a generic means to create comparators for these
classes.  Comparators are required for MapReduce keys and many other computations.  Thus we
should enhance the serialization framwork to provide comparators too.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message