hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-5588) change TopN to be an operator
Date Wed, 23 Oct 2013 22:32:45 GMT

    [ https://issues.apache.org/jira/browse/HIVE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803473#comment-13803473
] 

Sergey Shelukhin commented on HIVE-5588:
----------------------------------------

We have discussed a little bit (unfortunately after making the patch) and decided that we
won't do it.
It will add extra serialization overhead; and the problems that we were planning to solve
(e.g. putting it in front of FileSink for HIVE-4002 case) can be solved better via different
means. Moreover, given that sorting relies on BinarySortableSerDe, it will not work straightforwardly
w/whatever serde FileSink is using, additional serde will need to be created. And there's
no code in Hive to actually sort keys, w/o serde.

I will attach the patch for reference.
If needed in future it can be easily pushed thru.
It already works correctly on most scenarios; when distinct columns are present there's an
exception, small additional code duplication w/ReduceSink is needed to make it work. It is
explained in "TODO#" comment where the code is not correct.


> change TopN to be an operator
> -----------------------------
>
>                 Key: HIVE-5588
>                 URL: https://issues.apache.org/jira/browse/HIVE-5588
>             Project: Hive
>          Issue Type: Task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>
> See HIVE-5503, as well as the discussion in HIVE-3562.
> If topN is a separate operator, it can be reused for file sink, and vectorized version
can be implemented.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message