spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ben Moran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10914) Incorrect empty join sets when executor-memory >= 32g
Date Tue, 06 Oct 2015 17:42:27 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945441#comment-14945441
] 

Ben Moran commented on SPARK-10914:
-----------------------------------

I just ran with 
--executor-memory 100g --conf "spark.executor.extraJavaOptions=-XX:-UseCompressedOops"

but the problem persists.  In the worker log it shows:


15/10/06 18:36:36 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-7-oracle/jre/bin/java"
"-cp" "/home/spark/spark-1.5.1-bin-hadoop2.6/sbin/../conf/:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/spark-assembly-1.5.1-hadoop2.6.0.jar:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-core-3.2.10.jar:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-rdbms-3.2.9.jar:/home/spark/spark-1.5.1-bin-hadoop2.6/lib/datanucleus-api-jdo-3.2.6.jar"
"-Xms102400M" "-Xmx102400M" "-Dspark.driver.port=53169" "-XX:-UseCompressedOops" "-XX:MaxPermSize=256m"
"org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "akka.tcp://sparkDriver@10.122.82.99:53169/user/CoarseGrainedScheduler"
"--executor-id" "0" "--hostname" "10.122.82.99" "--cores" "20" "--app-id" "app-20151006183636-0019"
"--worker-url" "akka.tcp://sparkWorker@10.122.82.99:51402/user/Worker"


> Incorrect empty join sets when executor-memory >= 32g
> -----------------------------------------------------
>
>                 Key: SPARK-10914
>                 URL: https://issues.apache.org/jira/browse/SPARK-10914
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.5.0, 1.5.1
>         Environment: Ubuntu 14.04 (spark-slave), 12.04 (master)
>            Reporter: Ben Moran
>
> Using an inner join, to match together two integer columns, I generally get no results
when there should be matches.  But the results vary and depend on whether the dataframes are
coming from SQL, JSON, or cached, as well as the order in which I cache things and query them.
> This minimal example reproduces it consistently for me in the spark-shell, on new installs
of both 1.5.0 and 1.5.1 (pre-built against Hadoop 2.6 from http://spark.apache.org/downloads.html.)
> /* x is {"xx":1}{"xx":2} and y is just {"yy":1}{"yy:2} */
> val x = sql("select 1 xx union all select 2") 
> val y = sql("select 1 yy union all select 2")
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 0 */
> /* If I cache both tables it works: */
> x.cache()
> y.cache()
> x.join(y, $"xx" === $"yy").count() /* expect 2, get 2 */
> /* but this still doesn't work: */
> x.join(y, $"xx" === $"yy").filter("yy=1").count() /* expect 1, get 0 */



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message