hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chaoyu Tang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-11926) Stats annotation might not extract stats for varchar/decimal columns
Date Wed, 23 Sep 2015 03:47:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-11926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chaoyu Tang updated HIVE-11926:
-------------------------------
    Description: 
It is because StatsUtils uses the String.startWith to compare VARCHAR/DECIMAL column type
name with serdeConstants which are in lowercase.  But these type names in stats might not
be in lower case. We ran into a case where the type name from TAB_COL_STATS/PART_COL_STATS
was actually in uppercase (e.g. VARCHAR, DECIMAL) because these column stats were populated
from other HMS clients like Impala.
We need changes these type name comparison to be case insensitive

  was:
If column stats is calculated and populated to HMS from its client like Impala etc, the column
type name stored in TAB_COL_STATS/PART_COL_STATS could be in uppercase (e.g. VARCHAR, DECIMAL).
When Hive collects stats for these columns during optimization (with hive.stats.fetch.column.stats
set to true), it will throw out NPE. See error message like below:
{code}
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement:
FAILED: NullPointerException null
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:315)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:103)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:172)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:379)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:366)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:271)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:692)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException: null
at org.apache.hadoop.hive.ql.stats.StatsUtils.convertColStats(StatsUtils.java:636)
at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:623)
at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:180)
at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:136)
at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:124)
....truncated
{code}

        Summary: Stats annotation might not extract stats for varchar/decimal columns  (was:
NPE could occur in collectStatistics when column type is varchar)

Changed the JIRA title and description since NPE won't happen in this version. 

> Stats annotation might not extract stats for varchar/decimal columns
> --------------------------------------------------------------------
>
>                 Key: HIVE-11926
>                 URL: https://issues.apache.org/jira/browse/HIVE-11926
>             Project: Hive
>          Issue Type: Bug
>          Components: Logical Optimizer, Statistics
>    Affects Versions: 1.2.1
>            Reporter: Chaoyu Tang
>            Assignee: Chaoyu Tang
>
> It is because StatsUtils uses the String.startWith to compare VARCHAR/DECIMAL column
type name with serdeConstants which are in lowercase.  But these type names in stats might
not be in lower case. We ran into a case where the type name from TAB_COL_STATS/PART_COL_STATS
was actually in uppercase (e.g. VARCHAR, DECIMAL) because these column stats were populated
from other HMS clients like Impala.
> We need changes these type name comparison to be case insensitive



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message