Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 74EA6200D37 for ; Thu, 9 Nov 2017 11:34:06 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7381E160BEF; Thu, 9 Nov 2017 10:34:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C04B31609E5 for ; Thu, 9 Nov 2017 11:34:05 +0100 (CET) Received: (qmail 35565 invoked by uid 500); 9 Nov 2017 10:34:05 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 35555 invoked by uid 99); 9 Nov 2017 10:34:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Nov 2017 10:34:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 39F721807B4 for ; Thu, 9 Nov 2017 10:34:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.001 X-Spam-Level: X-Spam-Status: No, score=-100.001 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qMQ-RLcuNPwQ for ; Thu, 9 Nov 2017 10:34:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 51E7E5FDF7 for ; Thu, 9 Nov 2017 10:34:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 690C0E02BE for ; Thu, 9 Nov 2017 10:34:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D863423F05 for ; Thu, 9 Nov 2017 10:34:00 +0000 (UTC) Date: Thu, 9 Nov 2017 10:34:00 +0000 (UTC) From: "Adam Szita (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-18030) HCatalog can't be used with Pig on Spark MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Nov 2017 10:34:06 -0000 [ https://issues.apache.org/jira/browse/HIVE-18030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245478#comment-16245478 ] Adam Szita commented on HIVE-18030: ----------------------------------- The root cause of the problem is that HCatalog fails to figure out that the jobContext of the PoS job indeed has backend context. (This then causes to program to proceed in unwanted code paths) This feature was working previously by {{mapred.task.id}} property being set for Pig on MR/Tez jobs. In Spark mode this property is not used so I had to extend to condition used in {{checkJobContextIfRunningFromBackend()}} so it recognises PoS jobs as backend too. [~xuefuz], [~kellyzly] can you take a look on this patch please? > HCatalog can't be used with Pig on Spark > ---------------------------------------- > > Key: HIVE-18030 > URL: https://issues.apache.org/jira/browse/HIVE-18030 > Project: Hive > Issue Type: Bug > Components: HCatalog > Reporter: Adam Szita > Assignee: Adam Szita > Attachments: HIVE-18030.0.patch > > > When using Pig on Spark in cluster mode, all queries containing HCatalog access are failing: > {code} > 2017-11-03 12:39:19,268 [dispatcher-event-loop-19] INFO org.apache.spark.storage.BlockManagerInfo - Added broadcast_6_piece0 in memory on <>:<> (size: 83.0 KB, free: 408.5 MB) > 2017-11-03 12:39:19,277 [task-result-getter-0] WARN org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0, <>, executor 2): java.lang.NullPointerException > at org.apache.hadoop.security.Credentials.addAll(Credentials.java:401) > at org.apache.hadoop.security.Credentials.addAll(Credentials.java:388) > at org.apache.hive.hcatalog.pig.HCatLoader.setLocation(HCatLoader.java:128) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:147) > at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat$RecordReaderFactory.(PigInputFormat.java:115) > at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark$SparkRecordReaderFactory.(PigInputFormatSpark.java:126) > at org.apache.pig.backend.hadoop.executionengine.spark.running.PigInputFormatSpark.createRecordReader(PigInputFormatSpark.java:70) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:180) > at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:179) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:134) > at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:69) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)