Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Thu, 6 Nov 2014 04:33:33 +0000 (UTC)
From: "Xuefu Zhang (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12748999.1413584468000.431409.1415248413806@Atlassian.JIRA>
In-Reply-To: <JIRA.12748999.1413584468000@Atlassian.JIRA>
References: <JIRA.12748999.1413584468000@Atlassian.JIRA>
 <JIRA.12748999.1413584468864@arcas>
Subject: [jira] [Updated] (HIVE-8509) UT: fix list_bucket_dml_2 test
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HIVE-8509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xuefu Zhang updated HIVE-8509:
------------------------------
    Attachment: HIVE-8509-spark.patch

Reattach the same patch to rerun the task, as many test failures seemed unrelated.

> UT: fix list_bucket_dml_2 test
> ------------------------------
>
>                 Key: HIVE-8509
>                 URL: https://issues.apache.org/jira/browse/HIVE-8509
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Thomas Friedrich
>            Assignee: Chinna Rao Lalam
>            Priority: Minor
>         Attachments: HIVE-8509-spark.patch, HIVE-8509-spark.patch
>
>
> The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:
> org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: StatsPublisher cannot be connected to.There was a error while connecting to the StatsPublisher, and retrying might help. If you dont want the query to fail because accurate statistics could not be collected, set hive.stats.reliable=false
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
> at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
> at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
> at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)
> I debugged and found that FileSinkOperator.publishStats throws the exception when calling statsPublisher.connect here:
> if (!statsPublisher.connect(hconf)) {
> // just return, stats gathering should not block the main query
> LOG.error("StatsPublishing error: cannot connect to database");
> if (isStatsReliable)
> { throw new HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg()); }
> return;
> }
> With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, the statsPuvlisher is of type CounterStatsPublisher.
> In CounterStatsPublisher, the exception is thrown because getReporter() returns null for the MapredContext:
> MapredContext context = MapredContext.get();
> if (context == null || context.getReporter() == null)
> { return false; }
> When changing hive.stats.dbclass to jdbc:derby in data/conf/spark/hive-site.xml, similar to TestCliDriver it works:
> <property>
> <name>hive.stats.dbclass</name>
> <!-- <value>counter</value> -->
> <value>jdbc:derby</value>
> <description>The default storatge that stores temporary hive statistics. Currently, jdbc, hbase and counter type is supported</description>
> </property>
> In addition, I had to generate the out file for the test case for spark.
> When running this test with TestCliDriver and hive.stats.dbclass set to counter, the test case still works. The reporter is set to org.apache.hadoop.mapred.Task$TaskReporter. 
> Might need some additional investigation why the CounterStatsPublisher has no reporter in case of spark.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)