hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3380) need comparators in serializer framework
Date Tue, 13 May 2008 17:39:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596469#action_12596469

Doug Cutting commented on HADOOP-3380:

Under my proposal above, one would create a compator with:

RawComparator c = new SerializationFactory(conf).getSerialization(MyKey.class).getComparator();

So a configuration would be involved, and a serialization framework could in theory support
configurable comparators.  On the other hand, doing so efficiently might be hard.  One could,
e.g., implement JavaSerialization#getComparator() to read a configuration parameter that names
a list of fields and use introspection to order things by those fields.  Ideally it would
generate comparator code and compile it on the fly, but that's a lot of work.  Record IO provides
a single generated comparator that's efficient but not parameterized.  Thrift doesn't (yet)
even generate comparators!  Ideally IDL-generated serializers might generate a general-purpose
parameterized comparator, e.g., compare(int[] fieldIds), where {1,-3} might mean to order
by increasing values of the first field and decreasing values of the third.

For text input (e.g., tab-separated), one could easily write a configurable comparator.  We
could use the serialization framework to associate a Serialization for String that does that.
 Would that suffice for now?

> need comparators in serializer framework
> ----------------------------------------
>                 Key: HADOOP-3380
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3380
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: io
>            Reporter: Doug Cutting
> The new serialization framework permits Hadoop to incorporate different serialization
systems, including Hadoop's Writable, Thrift, Java Serialization, etc.  It provides a generic,
extensible means (SerializationFactory) to create serializers and deserializers for arbitrary
Java classes.  However it does not include a generic means to create comparators for these
classes.  Comparators are required for MapReduce keys and many other computations.  Thus we
should enhance the serialization framwork to provide comparators too.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message