spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sujith71955 <sujithchacko.2...@gmail.com>
Subject Re: Limit Query Performance Suggestion
Date Wed, 18 Jan 2017 02:40:03 GMT
Dear Liang,

Thanks for your valuable feedback.

There was a mistake in the previous post  i corrected it, as you mentioned
the  `GlobalLimit` we will only take the required number of rows from the
input iterator which really pulls data from local blocks and remote blocks.
but if the limit value is very high >= 10000000,  and when there will be a
shuffle exchange happens  between `GlobalLimit` and `LocalLimit` to retrieve
data from all partitions to one partition, since the limit value is very
large the performance bottleneck still exists.
 
soon in next  post i will publish a test report with sample data and also
figuring out a solution for this problem. 

Please let me know for any clarifications or suggestions regarding this
issue.

Regards,
Sujith



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Limit-Query-Performance-Suggestion-tp20570p20640.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message