hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12763) Use bit vector to track NDV
Date Mon, 25 Jan 2016 22:08:39 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116169#comment-15116169
] 

Alan Gates commented on HIVE-12763:
-----------------------------------

bq. Alan>> In hbase_metastore_proto.proto, I'm surprised to see that you are storing
the bit vectors as strings. Why not as bytes?
bq. Pengcheng>> I store bit vector as strings because the default serialization and
de-serialization is Text (or String) in Hive
Ok, I'm wondering if your de/serialization could be more efficient if you stored it as a binary
value rather than text.  But maybe it's not a big enough deal to worry about.

On the NOTICE file, I'm wrong.  You're just including it in the pom, not actually distributing
the code, so it's fine.  My mistake.

In NumDistinctValueEstimator.java:
# It would be good to either have a comment section at the head of the class that outlines
the algorithm, or perhaps a link to somewhere that explains it.  This will help future maintainers
understand how this code works.
# Isn't generateHashForPCSA just generateHash with hashNum = 0?  Why repeat the code?

The only one I think really needs fixed before commit is the commenting on the algorithm.



> Use bit vector to track NDV
> ---------------------------
>
>                 Key: HIVE-12763
>                 URL: https://issues.apache.org/jira/browse/HIVE-12763
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch, HIVE-12763.03.patch
>
>
> This will improve merging of per partitions stats. It will also help merge NDV for auto-gather
column stats.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message