spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From nongli <>
Subject [GitHub] spark pull request: Spark 13518
Date Fri, 26 Feb 2016 19:10:58 GMT
GitHub user nongli opened a pull request:

    Spark 13518

    ## What changes were proposed in this pull request?
    WIP: Don't merge.
    Change the default of the flag to enable this feature.
    ## How was this patch tested?
    The new parquet reader should be a drop in, so will be exercised by the existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull spark-13518

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11397
commit 858080b2626394b7dd975498690dcd0cfd27bf78
Author: Nong Li <>
Date:   2016-02-25T07:43:31Z

    [SPARK-13499][SQL] Performance improvements for parquet reader.
    This patch includes these performance fixes:
      - Remove unnecessary setNotNull() calls. The NULL bits are cleared already.
      - Speed up RLE group decoding
      - Speed up dictionary decoding by decoding NULLs directly into the result.
    In addition to the updated benchmarks, on TPCDS, the result of these changes
    running Q55 (sf40) is:
    TPCDS:                             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)
    q55 (Before)                             6398 / 6616         18.0          55.5
    q55 (After)                              4983 / 5189         23.1          43.3

commit 385e6c867c9b3a05d571837df88be0429b9e5a8c
Author: Nong Li <>
Date:   2016-02-26T18:59:42Z

    Update benchmark headings.

commit 358af8c5f31fdde77792b2263725f426c4acc3bd
Author: Nong Li <>
Date:   2016-02-26T19:04:53Z

    Update ceil.

commit e8090740fe374977fd1ea2ed5e7c369827a46e7b
Author: Nong Li <>
Date:   2016-02-26T19:07:20Z

    [SPARK-13518][SQL] Enable vectorized parquet scanner by default.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message