spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-13574) Improve parquet dictionary decoding for strings
Date Mon, 29 Feb 2016 20:15:18 GMT

    [ https://issues.apache.org/jira/browse/SPARK-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172518#comment-15172518
] 

Apache Spark commented on SPARK-13574:
--------------------------------------

User 'nongli' has created a pull request for this issue:
https://github.com/apache/spark/pull/11434

> Improve parquet dictionary decoding for strings
> -----------------------------------------------
>
>                 Key: SPARK-13574
>                 URL: https://issues.apache.org/jira/browse/SPARK-13574
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Nong Li
>            Priority: Minor
>
> Currently, the parquet reader will copy the dictionary value for each data value. This
is bad for string columns as we explode the dictionary during decode. We should instead, have
the data values point to the safe backing memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message