hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1402) Add parallel ORDER BY to Hive
Date Fri, 18 Jun 2010 08:33:25 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880120#action_12880120
] 

Jeff Zhang commented on HIVE-1402:
----------------------------------

Hi, I make a draft implementation for one special case. And it works, but since it is only
for one special case, so I have some hard coding. I hope someone can give some help or instruction
for the next step. 
One big problem of parallel ORDER BY is that the output  key type of ExecMapper is HiveKey,
and it has been serialized by LazyBinarySerDe, so the original column type is lost here. But
when do sampling and partition, I should use the original column type.

The following is my initial design.

1. During parse stage, extract one SampleOperator which has two children: TableScanOperator,
SelectOperator ( I am not familiar with Hive Parse Stage, and the code is not clear for me,
could anyone give some help or recommend some documentation about the Hive parser ? )

2. Modify the TotalOrderPartitioner.  Add a Deserializer to convert the HiveKey to its original
column type. and deserialie the HiveKey in method getPartition(). 

Welcome any comments and help.



> Add parallel ORDER BY to Hive
> -----------------------------
>
>                 Key: HIVE-1402
>                 URL: https://issues.apache.org/jira/browse/HIVE-1402
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.5.0
>            Reporter: Jeff Hammerbacher
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message