hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <>
Subject [jira] [Commented] (HIVE-12763) Use bit vector to track NDV
Date Mon, 25 Jan 2016 22:08:39 GMT


Alan Gates commented on HIVE-12763:

bq. Alan>> In hbase_metastore_proto.proto, I'm surprised to see that you are storing
the bit vectors as strings. Why not as bytes?
bq. Pengcheng>> I store bit vector as strings because the default serialization and
de-serialization is Text (or String) in Hive
Ok, I'm wondering if your de/serialization could be more efficient if you stored it as a binary
value rather than text.  But maybe it's not a big enough deal to worry about.

On the NOTICE file, I'm wrong.  You're just including it in the pom, not actually distributing
the code, so it's fine.  My mistake.

# It would be good to either have a comment section at the head of the class that outlines
the algorithm, or perhaps a link to somewhere that explains it.  This will help future maintainers
understand how this code works.
# Isn't generateHashForPCSA just generateHash with hashNum = 0?  Why repeat the code?

The only one I think really needs fixed before commit is the commenting on the algorithm.

> Use bit vector to track NDV
> ---------------------------
>                 Key: HIVE-12763
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-12763.01.patch, HIVE-12763.02.patch, HIVE-12763.03.patch
> This will improve merging of per partitions stats. It will also help merge NDV for auto-gather
column stats.

This message was sent by Atlassian JIRA

View raw message