spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gengliangwang <...@git.apache.org>
Subject [GitHub] spark pull request #21004: [SPARK-23896][SQL]Improve PartitioningAwareFileIn...
Date Sun, 08 Apr 2018 18:56:23 GMT
GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/21004

    [SPARK-23896][SQL]Improve PartitioningAwareFileIndex

    ## What changes were proposed in this pull request?
    
    Currently `PartitioningAwareFileIndex` accepts an optional parameter `userPartitionSchema`.
If provided, it will combine the inferred partition schema with the parameter.
    
    However,
    1. to get `userPartitionSchema`, we need to  combine inferred partition schema with `userSpecifiedSchema`
    2. to get the inferred partition schema, we have to create a temporary file index.
    
    Only after that, a final version of `PartitioningAwareFileIndex` can be created.
    
    This can be improved by passing `userSpecifiedSchema` to `PartitioningAwareFileIndex`.
    
    With the improvement, we can reduce redundant code and avoid parsing the file partition
twice. 
    ## How was this patch tested?
    Unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark PartitioningAwareFileIndex

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21004.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21004
    
----
commit 35aff24743ff13ccd370a8e3747a3044e8a671c9
Author: Gengliang Wang <gengliang.wang@...>
Date:   2018-04-08T18:19:48Z

    improve PartitioningAwareFileIndex

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message