spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (SPARK-16764) Recommend disabling vectorized parquet reader on OutOfMemoryError
Date Thu, 28 Jul 2016 06:21:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Apache Spark reassigned SPARK-16764:
------------------------------------

    Assignee:     (was: Apache Spark)

> Recommend disabling vectorized parquet reader on OutOfMemoryError
> -----------------------------------------------------------------
>
>                 Key: SPARK-16764
>                 URL: https://issues.apache.org/jira/browse/SPARK-16764
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Sameer Agarwal
>
> We currently don't bound or manage the data array size used by column vectors in the
vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data.
In the short term, we can probably intercept this exception and suggest the user to disable
the vectorized parquet reader. 
> Longer term, we should probably do explicit memory management for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message