spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lokesh.gidra" <lokesh.gi...@gmail.com>
Subject Re: "Spilling in-memory..." messages in log even with MEMORY_ONLY
Date Sun, 27 Jul 2014 21:21:44 GMT
I have confirmed, it is not GC related. Oprofile shows stop-the-world 
pauses separately from the regular java methods.

However, I was wrong when I said that the amount of time spent in 
writeObject0 is much more in local[n] mode as compared to standalone mode.

It is instead hashCode function. The time spent in fetching the hash 
code of java.lang.Object is almost 100 times more in local[n] as 
compared to standalone mode. To be precise, the exact function is 
JVM_IHashCode, which is called when hashCode function of 
java.lang.Object is called.

So now the question is, is there any possible reason why there would be 
large number of invocations of hashCode in local[n] mode as compared to 
standalone? Is there something related to hash tables?

On 07/27/2014 07:56 PM, Aaron Davidson [via Apache Spark User List] wrote:
> I see. There should not be a significant algorithmic difference 
> between those two cases, as far as I can think, but there is a good 
> bit of "local-mode-only" logic in Spark.
>
> One typical problem we see on large-heap, many-core JVMs, though, is 
> much more time spent in garbage collection. I'm not sure how oprofile 
> gathers its statistics, but it's possible the stop-the-world pauses 
> just appear as pausing inside regular methods. You could see if this 
> is happening by adding "-XX:+PrintGCDetails" 
> to spark.executor.extraJavaOptions (in spark-defaults.conf) 
> and --driver-java-options (as a command-line argument), and then 
> examining the stdout logs.
>
>
> On Sun, Jul 27, 2014 at 10:29 AM, lokesh.gidra <[hidden email] 
> </user/SendEmail.jtp?type=node&node=10744&i=0>> wrote:
>
>     I am comparing the total time spent in finishing the job. And What
>     I am
>     comparing, to be precise, is on a 48-core machine. I am comparing the
>     performance of local[48] vs. standalone mode with 8 nodes of 6
>     cores each
>     (totalling 48 cores) on localhost. In this comparison, the
>     standalone mode
>     outperforms local[48] substantially. When I did some
>     troublshooting using
>     oprofile, I found that local[48] was spending much more time in
>     writeObject0
>     as compared to standalone mode.
>
>     I am running the PageRank example provided in the package.
>
>
>
>     --
>     View this message in context:
>     http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723p10743.html
>     Sent from the Apache Spark User List mailing list archive at
>     Nabble.com.
>
>
>
>
> ------------------------------------------------------------------------
> If you reply to this email, your message will be added to the 
> discussion below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723p10744.html

>
> To unsubscribe from "Spilling in-memory..." messages in log even with 
> MEMORY_ONLY, click here 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=10723&code=bG9rZXNoLmdpZHJhQGdtYWlsLmNvbXwxMDcyM3wyODkyNzkxMjY=>.
> NAML 
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

>





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spilling-in-memory-messages-in-log-even-with-MEMORY-ONLY-tp10723p10749.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message