spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-7393) How to improve Spark SQL performance?
Date Wed, 06 May 2015 06:59:59 GMT

    [ https://issues.apache.org/jira/browse/SPARK-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530045#comment-14530045
] 

Liang Lee commented on SPARK-7393:
----------------------------------

We want to use Spark SQL in our project ,but we found that the Spark SQL performance is not
very well as we expected. The detail is as follows:
1. We save data as parquet file on HDFS.
2.We just select one or several rows from the parquet file using spark SQL.
3. When the total record number is 61 million, it needs about 3 seconds to get the result,
which is unacceptable long for our scenario. 
4.When the total record number is 2 million, it needs about 93 ms to get the result, whcih
is still a little long for us.
5. The query statement is like : SELECT * FROM DBA WHERE COLA=? AND COLB=?  And the table
is not complex, which has less 10 columns and the content for each column is less than 100
bytes.
6. Does any one know how to improve the performance or give some other ideas?
7. Can Spark SQL support micro-second-level response? 

> How to improve Spark SQL performance?
> -------------------------------------
>
>                 Key: SPARK-7393
>                 URL: https://issues.apache.org/jira/browse/SPARK-7393
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Liang Lee
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message