hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <>
Subject [jira] [Updated] (HIVE-5631) Index creation on a skew table fails
Date Wed, 12 Nov 2014 21:44:35 GMT


Ashutosh Chauhan updated HIVE-5631:
    Component/s:     (was: Database/Schema)

> Index creation on a skew table fails
> ------------------------------------
>                 Key: HIVE-5631
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Indexing
>    Affects Versions: 0.12.0, 0.13.0, 0.14.0
>            Reporter: Venki Korukanti
>            Assignee: Venki Korukanti
>         Attachments: HIVE-5631.1.patch.txt, HIVE-5631.2.patch.txt, HIVE-5631.3.patch.txt
> create database skewtest;
> use skewtest;
> create table skew (id bigint, acct string) skewed by (acct) on ('CC','CH');
> create index skew_indx on table skew (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'
> Last DDL fails with following error.
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidObjectException(message:Invalid
skew column [acct])
> When creating a table, Hive has sanity tests to make sure the columns have proper names
and the skewed columns are subset of the table columns. Here we fail because index table has
skewed column info. Index tables's skewed columns include {acct} and the columns are {id,
_bucketname, _offsets}. As the skewed column {acct} is not part of the table columns Hive
throws the exception.
> The reason why Index table got skewed column info even though its definition has no such
info is: When creating the index table a deep copy of the base table's StorageDescriptor (SD)
(in this case 'skew') is made. And in that copied SD, index specific parameters are set and
unrelated parameters are reset. Here skewed column info is not reset (there are few other
params that are not reset). That's why the index table contains the skewed column info.
> Fix: Instead of deep copying the base table StorageDescriptor, create a new one from
gathered info. This way it avoids the index table to inherit unnecessary properties in SD
from base table.

This message was sent by Atlassian JIRA

View raw message