hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Friedrich (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7953) Investigate query failures (2)
Date Fri, 17 Oct 2014 01:09:35 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174572#comment-14174572
] 

Thomas Friedrich commented on HIVE-7953:
----------------------------------------

The test list_bucket_dml_2 fails in FileSinkOperator.publishStats:

org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: StatsPublisher cannot be
connected to.There was a error while connecting to the StatsPublisher, and retrying might
help. If you dont want the query to fail because accurate statistics could not be collected,
set hive.stats.reliable=false
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:1079)
	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:971)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:582)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
	at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:594)
	at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.close(SparkMapRecordHandler.java:175)
	at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunctionResultList.closeRecordProcessor(HiveMapFunctionResultList.java:57)
	at org.apache.hadoop.hive.ql.exec.spark.HiveBaseFunctionResultList$ResultIterator.hasNext(HiveBaseFunctionResultList.java:121)

I debugged and found that FileSinkOperator.publishStats throws the exception when calling
statsPublisher.connect here:
if (!statsPublisher.connect(hconf)) {
      // just return, stats gathering should not block the main query
      LOG.error("StatsPublishing error: cannot connect to database");
      if (isStatsReliable) {
        throw new HiveException(ErrorMsg.STATSPUBLISHER_CONNECTION_ERROR.getErrorCodedMsg());
      }
      return;
    }

With the hive.stats.dbclass set to counter in data/conf/spark/hive-site.xml, the statsPuvlisher
is of type CounterStatsPublisher.
In CounterStatsPublisher, the exception is thrown because getReporter() returns null for the
MapredContext:
MapredContext context = MapredContext.get();
    if (context == null || context.getReporter() == null) {
      return false;
    }

When changing hive.stats.dbclass to jdbc:derby in data/conf/spark/hive-site.xml, similar to
TestCliDriver it works:
<property>
  <name>hive.stats.dbclass</name>
  <!-- <value>counter</value>  --> 
  <value>jdbc:derby</value> 
  <description>The default storatge that stores temporary hive statistics. Currently,
jdbc, hbase and counter type is supported</description>
</property>

In addition, I had to generate the out file for the test case for spark.

When running this test with TestCliDriver and hive.stats.dbclass set to counter, the test
case still works. The reporter is set to org.apache.hadoop.mapred.Task$TaskReporter. Might
need some additional investigation why the CounterStatsPublisher has no reporter in case of
spark.

> Investigate query failures (2)
> ------------------------------
>
>                 Key: HIVE-7953
>                 URL: https://issues.apache.org/jira/browse/HIVE-7953
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Brock Noland
>            Assignee: Thomas Friedrich
>
> I ran all q-file tests and the following failed with an exception:
> http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/HIVE-SPARK-ALL-TESTS-Build/lastCompletedBuild/testReport/
> we don't necessary want to run all these tests as part of the spark tests, but we should
understand why they failed with an exception. This JIRA is to look into these failures and
document them with one of:
> * New JIRA
> * Covered under existing JIRA
> * More investigation required
> Tests:
> {noformat}
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_temp_table_external	0.33
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucket_num_reducers	4.3
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_2	11
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_load_hdfs_file_with_space_in_the_name
0.65 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_4
4.7 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_7
2.8 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_2
5.5 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_position	1.5 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_18_part_external	2.4
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_6	11
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_smb_mapjoin_11	5.1 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_bucketsortoptimize_insert_8
10 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join	5.4 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_empty_dyn_part	0.81
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_compact1	0.31 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_ddl1	0.26 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_query2	0.73 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_auto_sortmerge_join_3	8.5
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_query5	0.34 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_rcfile_bigdata	0.93 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_groupby_multi_single_reducer
6.3 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_compact3	2.4 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_dbtxnmgr_compact2	0.56 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_stats_partscan_1_23	3.1
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_list_bucket_dml_2	4.3 sec
2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_15_external_part	3.2
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_16_part_external	2.8
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_17_part_managed	3.4
sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_20_part_managed_location
3.3 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_exim_19_00_part_external_location
6.9 sec	2
>  org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_external_table_with_space_in_location_path
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message