From dev-return-49069-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Fri Feb 2 04:51:06 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id EF9A9180652 for ; Fri, 2 Feb 2018 04:51:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id DEF2E160C57; Fri, 2 Feb 2018 03:51:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3301E160C44 for ; Fri, 2 Feb 2018 04:51:05 +0100 (CET) Received: (qmail 41194 invoked by uid 500); 2 Feb 2018 03:51:04 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 41178 invoked by uid 99); 2 Feb 2018 03:51:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Feb 2018 03:51:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6A3BD1A6973 for ; Fri, 2 Feb 2018 03:51:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -101.511 X-Spam-Level: X-Spam-Status: No, score=-101.511 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id CTPdKZo325E4 for ; Fri, 2 Feb 2018 03:51:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 094415FB9E for ; Fri, 2 Feb 2018 03:51:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 1FE6AE024A for ; Fri, 2 Feb 2018 03:51:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2EB0121E86 for ; Fri, 2 Feb 2018 03:51:00 +0000 (UTC) Date: Fri, 2 Feb 2018 03:51:00 +0000 (UTC) From: "jifei_yang (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (PHOENIX-4490) Phoenix Spark Module doesn't pass in user properties to create connection MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349745#comment-16349745 ] jifei_yang commented on PHOENIX-4490: ------------------------------------- Hi,[~karanmehta93],In the production environment, I solve the problem this way: 1, download the corresponding version of the Apache phoenix source 2, modify the package org.apache.phoenix.spark.ConfigurationUtil this class 3, in this package, add krb5.conf, you.keytab, hadoop xml file, log4j.properties and hbase-site.xml 4, the phoenix dependent dependencies added to the cluster /etc/spark/conf/classpath.txt, ensure that each submitted spark task, you can get the phoenix dependency package. > Phoenix Spark Module doesn't pass in user properties to create connection > ------------------------------------------------------------------------- > > Key: PHOENIX-4490 > URL: https://issues.apache.org/jira/browse/PHOENIX-4490 > Project: Phoenix > Issue Type: Bug > Reporter: Karan Mehta > Priority: Major > > Phoenix Spark module doesn't work perfectly in a Kerberos environment. This is because whenever new {{PhoenixRDD}} are built, they are always built with new and default properties. The following piece of code in {{PhoenixRelation}} is an example. This is the class used by spark to create {{BaseRelation}} before executing a scan. > {code} > new PhoenixRDD( > sqlContext.sparkContext, > tableName, > requiredColumns, > Some(buildFilter(filters)), > Some(zkUrl), > new Configuration(), > dateAsTimestamp > ).toDataFrame(sqlContext).rdd > {code} > This would work fine in most cases if the spark code is being run on the same cluster as HBase, the config object will pickup properties from Class path xml files. However in an external environment we should use the user provided properties and merge them before creating any {{PhoenixRelation}} or {{PhoenixRDD}}. As per my understanding, we should ideally provide properties in {{DefaultSource#createRelation() method}}. > An example of when this fails is, Spark tries to get the splits to optimize the MR performance for loading data in the table in {{PhoenixInputFormat#generateSplits()}} methods. Ideally, it should get all the config parameters from the {{JobContext}} being passed, but it is defaulted to {{new Configuration()}}, irrespective of what user passes in. Thus it fails to create a connection. > [~jmahonin] [~maghamravikiran@gmail.com] > Any ideas or advice? Let me know if I am missing anything obvious here. -- This message was sent by Atlassian JIRA (v7.6.3#76005)