kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wang, Gang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KYLIN-2903) support cardinality calculation for Hive view
Date Tue, 19 Dec 2017 11:35:00 GMT

     [ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wang, Gang updated KYLIN-2903:
------------------------------
    Attachment: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch

Attached it a patch.
One way is to leverage HQL 'COUNT DISTINCT' statement to calculate column cardinality, and
use 'INSERT OVERWRITE DIRECTORY' to put the result in the output path. To make it recognizable
for the following step HiveColumnCardinalityUpdateJob, the output need following the specified
format as following:
column1 cardinality
column2 cardinality
column3 cardinality
.....

And this can be reached as well by setting 'ROW FORMAT DELIMITED' and adding line break in
HQL.

> support cardinality calculation for Hive view
> ---------------------------------------------
>
>                 Key: KYLIN-2903
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2903
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>            Reporter: Wang, Gang
>            Assignee: Wang, Gang
>            Priority: Minor
>         Attachments: 0001-KYLIN-2903-support-cardinality-calculation-for-Hive-.patch
>
>
> Currently, Kylin leverage HCatlog to calculate column cardinality for Hive tables. While,
HCatlog does not support Hive view actually. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message