hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3380) need comparators in serializer framework
Date Tue, 13 May 2008 17:39:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596469#action_12596469
] 

Doug Cutting commented on HADOOP-3380:
--------------------------------------

Under my proposal above, one would create a compator with:

RawComparator c = new SerializationFactory(conf).getSerialization(MyKey.class).getComparator();

So a configuration would be involved, and a serialization framework could in theory support
configurable comparators.  On the other hand, doing so efficiently might be hard.  One could,
e.g., implement JavaSerialization#getComparator() to read a configuration parameter that names
a list of fields and use introspection to order things by those fields.  Ideally it would
generate comparator code and compile it on the fly, but that's a lot of work.  Record IO provides
a single generated comparator that's efficient but not parameterized.  Thrift doesn't (yet)
even generate comparators!  Ideally IDL-generated serializers might generate a general-purpose
parameterized comparator, e.g., compare(int[] fieldIds), where {1,-3} might mean to order
by increasing values of the first field and decreasing values of the third.

For text input (e.g., tab-separated), one could easily write a configurable comparator.  We
could use the serialization framework to associate a Serialization for String that does that.
 Would that suffice for now?

> need comparators in serializer framework
> ----------------------------------------
>
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
>
> The new serialization framework permits Hadoop to incorporate different serialization
systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic,
extensible means (SerializationFactory) to create serializers and deserializers for arbitrary
Java classes.  However it does not include a generic means to create comparators for these
classes.  Comparators are required for MapReduce keys and many other computations.  Thus we
should enhance the serialization framwork to provide comparators too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message