hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <>
Subject [jira] [Commented] (HIVE-5588) change TopN to be an operator
Date Wed, 23 Oct 2013 22:32:45 GMT


Sergey Shelukhin commented on HIVE-5588:

We have discussed a little bit (unfortunately after making the patch) and decided that we
won't do it.
It will add extra serialization overhead; and the problems that we were planning to solve
(e.g. putting it in front of FileSink for HIVE-4002 case) can be solved better via different
means. Moreover, given that sorting relies on BinarySortableSerDe, it will not work straightforwardly
w/whatever serde FileSink is using, additional serde will need to be created. And there's
no code in Hive to actually sort keys, w/o serde.

I will attach the patch for reference.
If needed in future it can be easily pushed thru.
It already works correctly on most scenarios; when distinct columns are present there's an
exception, small additional code duplication w/ReduceSink is needed to make it work. It is
explained in "TODO#" comment where the code is not correct.

> change TopN to be an operator
> -----------------------------
>                 Key: HIVE-5588
>                 URL:
>             Project: Hive
>          Issue Type: Task
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
> See HIVE-5503, as well as the discussion in HIVE-3562.
> If topN is a separate operator, it can be reused for file sink, and vectorized version
can be implemented.

This message was sent by Atlassian JIRA

View raw message