spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (SPARK-13000) Corrupted results when using LIMIT clause via JDBC connections to ThriftServer
Date Thu, 28 Jan 2016 10:24:39 GMT

     [ https://issues.apache.org/jira/browse/SPARK-13000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean Owen reopened SPARK-13000:
-------------------------------

> Corrupted results when using LIMIT clause via JDBC connections to ThriftServer
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-13000
>                 URL: https://issues.apache.org/jira/browse/SPARK-13000
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.2
>         Environment: Amazon EMR AMI 4.2.0
> Spark 1.5.2
>            Reporter: Daniel Harper
>
> h2. Steps to reproduce
> # Create table in HIVE  (see below for definition)
> # Insert some data (at least 2 rows) 
> # Start thrift service
> # Connect to thrift service via {{beeline}} or custom application via JDBC
> # Run query {{select * from logs_table limit 1}}
> h2. Detailed description
> We're seeing strange results for the following query when executed via JDBC connections
to the thrift server 
> {code}
> select * from logs_table limit 1;
> {code}
> We've tried this using {{beeline}} and as you can see, the {{service}} and other columns
are blank 
> {code}
> [hadoop@ip-x ~]$ beeline
> Beeline version 1.0.0-amzn-1 by Apache Hive
> beeline> !connect jdbc:hive2://localhost:10001/default
> scan complete in 5ms
> Connecting to jdbc:hive2://localhost:10001/default
> Enter username for jdbc:hive2://localhost:10001/default:
> Enter password for jdbc:hive2://localhost:10001/default:
> Connected to: Spark SQL (version 1.5.2)
> Driver: Hive JDBC (version 1.0.0-amzn-1)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> 0: jdbc:hive2://localhost:10001/default> select * from logs_table limit 1;
> +------------------------+----------+-------+-----+-----+--+
> |           ts           | service  | yyyy  | mm  | dd  |
> +------------------------+----------+-------+-----+-----+--+
> | 2016-01-24 23:23:24.0  |   |   |   |   |
> +------------------------+----------+-------+-----+-----+--+
> 1 row selected (9.182 seconds)
> {code}
> Removing the {{LIMIT 1}} clause, we get the full dataset and all columns are present.
> {code}
> 0: jdbc:hive2://localhost:10001/default> select * from logs_table;
> +------------------------+----------+-------+-----+-----+--+
> |           ts           | service  | yyyy  | mm  | dd  |
> +------------------------+----------+-------+-----+-----+--+
> | 2016-01-24 23:23:24.0  |service_1  | 2016  | 01  | 24  |
> | 2016-01-24 23:29:24.0  |service_4  | 2016  | 01  | 24  |
> +------------------------+----------+-------+-----+-----+--+
> 2 rows selected (10.956 seconds)
> {code}
> I ran the query {{select * from logs_table limit 1}} via 
> * {{spark-sql}}
> * {{spark-shell}}
> ...and both returned the expected results, limiting the resultset to 1 row and with all
the columns populated.
> This leads me to believe this is an issue with the Thrift Server or Hive JDBC driver.
> We are starting the thrift server as follows:
> {code}
> sudo /usr/lib/spark/sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001
--num-executors 1 --executor-cores 5 --executor-memory 38G --conf spark.scheduler.mode=FAIR
--conf spark.sql.thriftserver.scheduler.pool=default --driver-memory 10G
> {code}
> h2. Resources 
> The HIVE table is defined as follows: 
> {code}
> CREATE EXTERNAL TABLE IF NOT EXISTS logs_table (
>         ts STRING,
>         service STRING
> )
> COMMENT 'logs table'
> PARTITIONED BY (yyyy STRING, mm STRING, dd STRING)
> STORED AS TEXTFILE
> LOCATION 's3://data-lake/structured/';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message