spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Rosen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-14055) AssertionError may happeneds if not unlock writeLock when doing 'removeBlock' method
Date Wed, 23 Mar 2016 17:24:25 GMT

     [ https://issues.apache.org/jira/browse/SPARK-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Josh Rosen updated SPARK-14055:
-------------------------------
    Assignee: Ernest

> AssertionError may happeneds if not unlock writeLock when doing 'removeBlock' method
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-14055
>                 URL: https://issues.apache.org/jira/browse/SPARK-14055
>             Project: Spark
>          Issue Type: Bug
>          Components: Block Manager, Spark Core
>    Affects Versions: 2.0.0
>         Environment: Spark 2.0-SNAPSHOT
> Single Rack
> Standalone mode scheduling
> 8 node cluster
> 16 cores & 64G RAM / node
> Data Replication factor of 2
> Each Node has 1 Spark executors configured with 16 cores each and 40GB of RAM.
>            Reporter: Ernest
>            Assignee: Ernest
>            Priority: Critical
>
> We got the following log when running _LiveJournalPageRank_.
> {quote}
> 452823:16/03/21 19:28:47.444 TRACE BlockInfoManager: Task 1662 trying to acquire write
lock for rdd_3_183
> 452825:16/03/21 19:28:47.445 TRACE BlockInfoManager: Task 1662 acquired write lock for
rdd_3_183
> 456941:16/03/21 19:28:47.596 INFO BlockManager: Dropping block rdd_3_183 from memory
> 456943:16/03/21 19:28:47.597 DEBUG MemoryStore: Block rdd_3_183 of size 418784648 dropped
from memory (free 3504141600)
> 457027:16/03/21 19:28:47.600 DEBUG BlockManagerMaster: Updated info of block rdd_3_183
> 457053:16/03/21 19:28:47.600 DEBUG BlockManager: Told master about block rdd_3_183
> 457082:16/03/21 19:28:47.602 TRACE BlockInfoManager: Task 1662 trying to remove block
rdd_3_183
> 500373:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to put rdd_3_183
> 500374:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to acquire read
lock for rdd_3_183
> 500375:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 trying to acquire write
lock for rdd_3_183
> 500376:16/03/21 19:28:49.893 TRACE BlockInfoManager: Task 1681 acquired write lock for
rdd_3_183
> 517257:16/03/21 19:28:56.299 INFO BlockInfoManager: ****** taskAttemptId is: 1662, info.writerTask
is: 1681, blockID is: rdd_3_183 so AssertionError happeneds here*****
> 517258-16/03/21 19:28:56.299 ERROR Executor: Exception in task 177.0 in stage 10.0 (TID
1662)
> 517259-java.lang.AssertionError: assertion failed
> 517260- at scala.Predef$.assert(Predef.scala:151)
> 517261- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:356)
> 517262- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1$$anonfun$apply$1.apply(BlockInfoManager.scala:351)
> 517263- at scala.Option.foreach(Option.scala:257)
> 517264- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:351)
> 517265- at org.apache.spark.storage.BlockInfoManager$$anonfun$releaseAllLocksForTask$1.apply(BlockInfoManager.scala:350)
> 517266- at scala.collection.mutable.HashSet.foreach(HashSet.scala:78)
> 517267- at org.apache.spark.storage.BlockInfoManager.releaseAllLocksForTask(BlockInfoManager.scala:350)
> 517268- at org.apache.spark.storage.BlockManager.releaseAllLocksForTask(BlockManager.scala:626)
> 517269- at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:238)
> {quote}
> When memory for RDD storage is not sufficient and have to evict several partitions, this
_AssertionError_ may happened. 
> For the above example, this is because while running _Task 1662_, several partition (including
rdd_3_183) need to be evicted. So _Task 1662_ acquired  read and write locks at first, then
doing _dropBlock_ method in _MemoryStore.evictBlocksToFreeSpace_ and actually dropping _rdd_3_183_
from memory. The _newEffectiveStorageLevel.isValid_ is false, so we run into _BlockInfoManager.removeBlock_,
but _writeLocksByTask_  is not update here.
> Unfortunately, _Task 1681_ is already started and needed to reproduce rdd\_3\_183 to
produce it's target rdd here , and this task acquired write lock of rdd\_3\_183. When _Task
1662_ call _releaseAllLocksForTask_ at last, this _AssertionError_ occurs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message