spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Fu (JIRA)" <>
Subject [jira] [Commented] (SPARK-22113) Dataset shows in Hive is inconsistent with JDBC
Date Tue, 26 Sep 2017 03:22:00 GMT


Michael Fu commented on SPARK-22113:

Hi [~viirya], Thanks for investigating. I did my test according [sql-programming-guide|]

The reason why I access Hive(actually it's Impala here) via JDBC in spark is we integrate
KUDU with Impala. So for any delete/update operation, I have to talk with JDBC. 

And for any query operation, I will obviously choose init session with _enableHiveSupport_
which is recommended by document. 
But why I raise this query operation issue is because I want to make sure all JDBC operations
work well in spark, so I could use it in production without concern. 

BTW: I raised a [question in stackoverflow|]
before I come here. There's a comment in there, hope it help.

> Dataset shows in Hive is inconsistent with JDBC
> -----------------------------------------------
>                 Key: SPARK-22113
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.2.0
>         Environment: version 2.2.0
>            Reporter: Michael Fu
> I am trying to query data from Hive in spark. According spark-sql document, there're
two ways to do this:
> The first way is Init session with _enableHiveSupport_
> {code:java}
> SparkSession session = SparkSession.builder().enableHiveSupport().getOrCreate();
> session.sql("select dw_date from tfdw.dwd_dim_date limit 10").show();
> {code}
> the dataset shows the correct result
> !!
> The second way is through JDBC
> {code:java}
> Dataset<Row> ds =
>                   .format("jdbc")
>                   .option("driver", "org.apache.hive.jdbc.HiveDriver")
>                   .option("url", "jdbc:hive2://iZ11syxr6afZ:21050/;auth=noSasl")
>                   .option("dbtable", "tfdw.dwd_dim_date")
>                   .load();
> {code}
> But the dataset only show the column name in the result rather than the data in the column
> !!
> The two pictures should be consistent I think. Any outstanding I missed ? Many thanks!

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message