Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Mon, 25 Sep 2017 08:06:01 +0000 (UTC)
From: "Liang-Chi Hsieh (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13104646.1506312908000.190650.1506326761798@Atlassian.JIRA>
In-Reply-To: <JIRA.13104646.1506312908000@Atlassian.JIRA>
References: <JIRA.13104646.1506312908000@Atlassian.JIRA> <JIRA.13104646.1506312908272@jira-lw-us.apache.org>
Subject: [jira] [Commented] (SPARK-22113) Dataset shows in Hive is
 inconsistent with JDBC
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 25 Sep 2017 08:06:07 -0000


    [ https://issues.apache.org/jira/browse/SPARK-22113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16178698#comment-16178698 ] 

Liang-Chi Hsieh commented on SPARK-22113:
-----------------------------------------

Hmm, actually we have the API {{def select(col: String, cols: String*): DataFrame}}. So I think it doesn't select a string literal.

Actually I found the reason why the second JDBC-based query fails. I just wonder if we should connect Hive with jdbc.

[~Michael Fu] Where is the spark-sql document you mentioned?

> Dataset shows in Hive is inconsistent with JDBC
> -----------------------------------------------
>
>                 Key: SPARK-22113
>                 URL: https://issues.apache.org/jira/browse/SPARK-22113
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>         Environment: version 2.2.0
>            Reporter: Michael Fu
>
> I am trying to query data from Hive in spark. According spark-sql document, there're two ways to do this:
> The first way is Init session with _enableHiveSupport_
> {code:java}
> SparkSession session = SparkSession.builder().enableHiveSupport().getOrCreate();
> session.sql("select dw_date from tfdw.dwd_dim_date limit 10").show();
> {code}
> the dataset shows the correct result
> !https://i.stack.imgur.com/gBJCj.png!
> The second way is through JDBC
> {code:java}
> Dataset<Row> ds = session.read()
>                   .format("jdbc")
>                   .option("driver", "org.apache.hive.jdbc.HiveDriver")
>                   .option("url", "jdbc:hive2://iZ11syxr6afZ:21050/;auth=noSasl")
>                   .option("dbtable", "tfdw.dwd_dim_date")
>                   .load();
> ds.select("dw_date").limit(10).show();
> {code}
> But the dataset only show the column name in the result rather than the data in the column
> !https://i.stack.imgur.com/FBMDN.png!
> The two pictures should be consistent I think. Any outstanding I missed ? Many thanks!


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org