spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sitalkedia <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...
Date Tue, 17 May 2016 15:52:24 GMT
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/13107#issuecomment-219762996
  
    I am not 100% sure of the root cause, but I suspect this is happening when JVM is trying
to allocate very large size buffer for pointer array. The issue might be because the JVM is
not able to allocate large buffer in contiguous memory location on heap and since the unsafe
operations assume contiguous memory location of the objects, any unsafe operation on large
buffer results in memory corruption which manifests as TimSort issue. 
    
    Unfortunately, this issue is not reproducible consistently and I am not sure of the root
cause. So I am not sure how can we have a regression test for it.
    
    Also, please note that this change itself is a no-op unless you override the default value
of `numElementsForSpillThreshold`, which is `Long.MAX_VALUE`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message