hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-4006) Add local sort operator
Date Mon, 11 Feb 2013 09:57:12 GMT
Namit Jain created HIVE-4006:
--------------------------------

             Summary: Add local sort operator
                 Key: HIVE-4006
                 URL: https://issues.apache.org/jira/browse/HIVE-4006
             Project: Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Namit Jain
            Assignee: Namit Jain


We've seen in the past that sorting data on a specific column can greatly improve the compression
of data.  The problem is that sorting data is expensive and requires a reduce phase.

One way around this is to add a local sort (either as an operator or between serialization
and output).  This could take chunks of rows and do an in memory sort of these.  This would
be much faster, but would need to be very memory efficient in order to get the maximum number
of rows in a chunk (and hence the maximum benefit).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message