hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-525) Need raw comparators for hadoop record types
Date Wed, 18 Oct 2006 17:21:36 GMT
     [ http://issues.apache.org/jira/browse/HADOOP-525?page=all ]

Doug Cutting updated HADOOP-525:

    Priority: Major  (was: Minor)

A raw comparator shouldn't have to deserialize fields, but should operate directly on the
field data.  For primitive fields we'd generate calls to methods like WritableComparator.{readInt,readLong,...}.
 For Text, we'd generate calls to WritableComparator.compareBytes().  For complex objects
we'd generate calls to their raw comparator.

Besides having a huge performance benefit, adding raw comparators to records would solve other
problems with Hadoop's io framework: currently it is possible for raw and cooked comparators
to differ.  But if both are auto-generated from the same source they'll be guaranteed compatible.
 Also, raw comparators are fragile and difficult to develop, since they bypass all type mechanisms.
 Generated code would ensure correctness.

I've increased the priority of this issue.  We should implement this and start using records
more extensively.  Prior we've mostly thought of records as an aid for interoperability with
other programming languages, but I think they'll also be a valuable for performance and correctness.

> Need raw comparators for hadoop record types
> --------------------------------------------
>                 Key: HADOOP-525
>                 URL: http://issues.apache.org/jira/browse/HADOOP-525
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: record
>    Affects Versions: 0.6.0
>            Reporter: Sameer Paranjpye
>         Assigned To: Milind Bhandarkar
>             Fix For: 0.8.0
>         Attachments: TypeBuilder-support.tar, TypeBuilder.java, WordCountType.java
> Raw comparators are not generated for types that are generated with the Hadoop record
framework. This could have a substantial performance impact when using hadoop record generated
types in Map/Reduce. The record i/o framework should auto-generate raw comparators for types.
> Comparison for hadoop record i/o types is defined to be member wise comparison of objects.
A possible implementation could only deserialize one member from each object at a time, compare
them and either return or move on to the next member if the values are equal.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message