orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wgtmac <...@git.apache.org>
Subject [GitHub] orc issue #301: ORC-395: Support ZSTD in C++ writer/reader
Date Fri, 17 Aug 2018 18:52:25 GMT
Github user wgtmac commented on the issue:

    https://github.com/apache/orc/pull/301
  
    To provide some benchmark results, I did some tests on my laptop using TPC-H 1GB dataset
and C++ tools csv-import and orc-scan were used with default configuration.
    
    **Writer CPU Time (unit: second)**
    
    name | zlib | zstd
    -- | -- | --
    customer | 1.976 | 0.777
    lineitem | 50.754 | 19.990
    nation | 0.002 | 0.003
    orders | 11.054 | 4.895
    part | 1.893 | 0.771
    partsupp | 8.791 | 3.512
    region | 0.002 | 0.002
    supplier | 0.130 | 0.056
    
    **Reader CPU Time (unit: second)**
    
    name | zlib | zstd
    -- | -- | --
    customer | 0.084 | 0.063
    lineitem | 2.263 | 2.094
    nation | 0.001 | 0.001
    orders | 0.454 | 0.340
    part | 0.071 | 0.061
    partsupp | 0.343 | 0.253
    region | 0.000 | 0.001
    supplier | 0.006 | 0.005
    
    **File Size (unit: byte)**
    
    name | zlib | zstd
    -- | -- | --
    customer | 7494965 | 7670751
    lineitem | 162544602 | 178904712
    nation | 1760 | 1882
    orders | 34599561 | 38028670
    part | 4273944 | 4676560
    partsupp | 25766380 | 29498151
    region | 1026 | 1097
    supplier | 474099 | 478017
    
    In total, ZSTD writer time has 148.6% saving and reader time has 14.4% saving. File size
is 9.4% bigger for ZSTD. The result provides a basic idea of performance comparison between
them. As we use default configuration (ZLIB default level is -1 and ZSTD is 3), it may be
unfair because ZSTD has 22 levels while ZLIB has 9 in total. If we choose different levels
or different datasets, the result can vary a lot and ZSTD can beat ZLIB on file sizes. Overall,
ZSTD seems to be a good compression option.



---

Mime
View raw message