spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "kai zhao (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-27854) [Spark-SQL] OOM when using unequal join sql
Date Mon, 27 May 2019 09:32:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-27854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

kai zhao updated SPARK-27854:
-----------------------------
    Environment: 
Spark Version:1.6.2

HDP Version:2.5

JDK Version:1.8

OS Version:Redhat 7.3

 

Cluster Info:

8 nodes

Each node :

RAM: 256G

CPU:  Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz  (40 cores)

Disk:10*4T HDD+1T SSD

 

Yarn Config:

NodeManager Memory:210G

NodeManager Vcores:70

 

Runtime Information

Java Home=/opt/jdk1.8.0_131/jre
 Java Version=1.8.0_131 (Oracle Corporation)
 Scala Version=version 2.10.5

Spark Properties

spark.app.id=application_1558686555626_0024
 spark.app.name=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
 spark.driver.appUIAddress=[http://172.17.3.2:4040|http://172.17.3.2:4040/]
 spark.driver.extraClassPath=/yinhai_platform/resources/spark_dep_jar/*
 spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
 spark.driver.host=172.17.3.2
 spark.driver.maxResultSize=16g
 spark.driver.port=44591
 spark.dynamicAllocation.enabled=true
 spark.dynamicAllocation.initialExecutors=0
 spark.dynamicAllocation.maxExecutors=200
 spark.dynamicAllocation.minExecutors=0
 spark.eventLog.dir=hdfs:///spark-history
 spark.eventLog.enabled=true
 spark.executor.cores=5
 spark.executor.extraClassPath=/yinhai_platform/resources/spark_dep_jar/*
 spark.executor.extraJavaOptions=-XX:MaxPermSize=10240m
 spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
 spark.executor.id=driver
 spark.executor.memory=16g
 spark.externalBlockStore.folderName=spark-058bff7c-f76c-4a0e-86a3-b390f2f06d1a
 spark.hadoop.cacheConf=false
 spark.history.fs.logDirectory=hdfs:///spark-history
 spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
 spark.kryo.referenceTracking=false
 spark.kryoserializer.buffer.max=1024m
 spark.local.dir=/data/disk1/spark-local-dir
 spark.master=yarn-client
 spark.network.timeout=600s
 spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS=ambari-node-2
 spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES=[http://ambari-node-2:8088/proxy/application_1558686555626_0024]
 spark.scheduler.allocation.file /usr/hdp/current/spark-thriftserver/conf/spark-thrift-fairscheduler.xml
 spark.scheduler.mode=FAIR
 spark.serializer=org.apache.spark.serializer.KryoSerializer
 spark.shuffle.managr=SORT
 spark.shuffle.service.enabled=true
 spark.shuffle.service.port=9339
 spark.submit.deployMode=client
 spark.ui.filters=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
 spark.yarn.am.cores=5
 spark.yarn.am.memory=16g
 spark.yarn.queue=default

  was:
Spark Version:1.6.2

HDP Version:2.5

JDK Version:1.8

OS Version:Redhat 7.3

 

Runtime Information

Java Home=/opt/jdk1.8.0_131/jre
Java Version=1.8.0_131 (Oracle Corporation)
Scala Version=version 2.10.5


Spark Properties

spark.app.id=application_1558686555626_0024
spark.app.name=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
spark.driver.appUIAddress=http://172.17.3.2:4040
spark.driver.extraClassPath=/yinhai_platform/resources/spark_dep_jar/*
spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.driver.host=172.17.3.2
spark.driver.maxResultSize=16g
spark.driver.port=44591
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.initialExecutors=0
spark.dynamicAllocation.maxExecutors=200
spark.dynamicAllocation.minExecutors=0
spark.eventLog.dir=hdfs:///spark-history
spark.eventLog.enabled=true
spark.executor.cores=5
spark.executor.extraClassPath=/yinhai_platform/resources/spark_dep_jar/*
spark.executor.extraJavaOptions=-XX:MaxPermSize=10240m
spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.executor.id=driver
spark.executor.memory=16g
spark.externalBlockStore.folderName=spark-058bff7c-f76c-4a0e-86a3-b390f2f06d1a
spark.hadoop.cacheConf=false
spark.history.fs.logDirectory=hdfs:///spark-history
spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
spark.kryo.referenceTracking=false
spark.kryoserializer.buffer.max=1024m
spark.local.dir=/data/disk1/spark-local-dir
spark.master=yarn-client
spark.network.timeout=600s
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS=ambari-node-2
spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES=http://ambari-node-2:8088/proxy/application_1558686555626_0024
spark.scheduler.allocation.file /usr/hdp/current/spark-thriftserver/conf/spark-thrift-fairscheduler.xml
spark.scheduler.mode=FAIR
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.shuffle.managr=SORT
spark.shuffle.service.enabled=true
spark.shuffle.service.port=9339
spark.submit.deployMode=client
spark.ui.filters=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
spark.yarn.am.cores=5
spark.yarn.am.memory=16g
spark.yarn.queue=default


> [Spark-SQL] OOM when using unequal join sql 
> --------------------------------------------
>
>                 Key: SPARK-27854
>                 URL: https://issues.apache.org/jira/browse/SPARK-27854
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2
>         Environment: Spark Version:1.6.2
> HDP Version:2.5
> JDK Version:1.8
> OS Version:Redhat 7.3
>  
> Cluster Info:
> 8 nodes
> Each node :
> RAM: 256G
> CPU:  Intel(R) Xeon(R) Silver 4114 CPU @ 2.20GHz  (40 cores)
> Disk:10*4T HDD+1T SSD
>  
> Yarn Config:
> NodeManager Memory:210G
> NodeManager Vcores:70
>  
> Runtime Information
> Java Home=/opt/jdk1.8.0_131/jre
>  Java Version=1.8.0_131 (Oracle Corporation)
>  Scala Version=version 2.10.5
> Spark Properties
> spark.app.id=application_1558686555626_0024
>  spark.app.name=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
>  spark.driver.appUIAddress=[http://172.17.3.2:4040|http://172.17.3.2:4040/]
>  spark.driver.extraClassPath=/yinhai_platform/resources/spark_dep_jar/*
>  spark.driver.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
>  spark.driver.host=172.17.3.2
>  spark.driver.maxResultSize=16g
>  spark.driver.port=44591
>  spark.dynamicAllocation.enabled=true
>  spark.dynamicAllocation.initialExecutors=0
>  spark.dynamicAllocation.maxExecutors=200
>  spark.dynamicAllocation.minExecutors=0
>  spark.eventLog.dir=hdfs:///spark-history
>  spark.eventLog.enabled=true
>  spark.executor.cores=5
>  spark.executor.extraClassPath=/yinhai_platform/resources/spark_dep_jar/*
>  spark.executor.extraJavaOptions=-XX:MaxPermSize=10240m
>  spark.executor.extraLibraryPath=/usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
>  spark.executor.id=driver
>  spark.executor.memory=16g
>  spark.externalBlockStore.folderName=spark-058bff7c-f76c-4a0e-86a3-b390f2f06d1a
>  spark.hadoop.cacheConf=false
>  spark.history.fs.logDirectory=hdfs:///spark-history
>  spark.history.provider=org.apache.spark.deploy.history.FsHistoryProvider
>  spark.kryo.referenceTracking=false
>  spark.kryoserializer.buffer.max=1024m
>  spark.local.dir=/data/disk1/spark-local-dir
>  spark.master=yarn-client
>  spark.network.timeout=600s
>  spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_HOSTS=ambari-node-2
>  spark.org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter.param.PROXY_URI_BASES=[http://ambari-node-2:8088/proxy/application_1558686555626_0024]
>  spark.scheduler.allocation.file /usr/hdp/current/spark-thriftserver/conf/spark-thrift-fairscheduler.xml
>  spark.scheduler.mode=FAIR
>  spark.serializer=org.apache.spark.serializer.KryoSerializer
>  spark.shuffle.managr=SORT
>  spark.shuffle.service.enabled=true
>  spark.shuffle.service.port=9339
>  spark.submit.deployMode=client
>  spark.ui.filters=org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter
>  spark.yarn.am.cores=5
>  spark.yarn.am.memory=16g
>  spark.yarn.queue=default
>            Reporter: kai zhao
>            Priority: Major
>
> I am using Spark 1.6.2 which is from hdp package.
> Reproduce Steps:
> 1.Start Spark thrift server Or write DataFrame Code
> 2.Execute SQL like:select * from table_a  left join table_b on table_a.fieldA <=
table_b.fieldB
> 3.Wait until the job is finished
>  
> Actual Result:
> SQL won't  execute failed With multipule task error:
> a)ExecutorLostFailure (executor 119 exited caused by one of the running tasks) Reason:
Container marked as failed: 
> b)java.lang.OutOfMemoryError: Java heap space 
> Expect:
> SQL runs Successfully .
> I have tried every method on the Internet . But it still won't work



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message