spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1861) ArrayIndexOutOfBoundsException when reading bzip2 files
Date Thu, 26 Jun 2014 10:07:24 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044536#comment-14044536
] 

sam commented on SPARK-1861:
----------------------------

[~mengxr] Any idea when that will be?

> ArrayIndexOutOfBoundsException when reading bzip2 files
> -------------------------------------------------------
>
>                 Key: SPARK-1861
>                 URL: https://issues.apache.org/jira/browse/SPARK-1861
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.9.0, 1.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>
> Hadoop uses CBZip2InputStream to decode bzip2 files. However, the implementation is not
threadsafe and Spark may run multiple tasks in the same JVM, which leads to this error. This
is not a problem for Hadoop MapReduce because Hadoop runs each task in a separate JVM.
> A workaround is to set `SPARK_WORKER_CORES=1` in spark-env.sh for a standalone cluster.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message