spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export
Date Sun, 26 Apr 2015 03:19:38 GMT
Joseph K. Bradley created SPARK-7148:
----------------------------------------

             Summary: Configure Parquet block size (row group size) for ML model import/export
                 Key: SPARK-7148
                 URL: https://issues.apache.org/jira/browse/SPARK-7148
             Project: Spark
          Issue Type: Improvement
          Components: MLlib, SQL
    Affects Versions: 1.3.1, 1.3.0, 1.4.0
            Reporter: Joseph K. Bradley
            Priority: Minor


It would be nice if we could configure the Parquet buffer size when using Parquet format for
ML model import/export.  Currently, for some models (trees and ensembles), the schema has
13+ columns.  With a default buffer size of 128MB (I think), that puts the allocated buffer
way over the default memory made available by run-example.  Because of this problem, users
have to use spark-submit and explicitly use a larger amount of memory in order to run some
ML examples.

Is there a simple way to specify {{parquet.block.size}}?  I'm not familiar with this part
of SparkSQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message