From user-return-196-archive-asf-public=cust-asf.ponee.io@orc.apache.org Mon Mar 19 08:58:23 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id A289A18067E for ; Mon, 19 Mar 2018 08:58:14 +0100 (CET) Received: (qmail 27243 invoked by uid 500); 19 Mar 2018 07:58:13 -0000 Mailing-List: contact user-help@orc.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@orc.apache.org Delivered-To: mailing list user@orc.apache.org Received: (qmail 27224 invoked by uid 99); 19 Mar 2018 07:58:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2018 07:58:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id CDE54C6B17; Mon, 19 Mar 2018 07:58:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.702 X-Spam-Level: ** X-Spam-Status: No, score=2.702 tagged_above=-999 required=6.31 tests=[FILL_THIS_FORM=0.001, HTML_MESSAGE=2, HTML_TAG_BALANCE_BODY=0.712, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id qsLQaRZPkWs6; Mon, 19 Mar 2018 07:58:06 +0000 (UTC) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 4E5F35F3CD; Mon, 19 Mar 2018 07:58:04 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Mar 2018 00:57:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.48,329,1517904000"; d="scan'208,217";a="26462078" Received: from fmsmsx108.amr.corp.intel.com ([10.18.124.206]) by orsmga008.jf.intel.com with ESMTP; 19 Mar 2018 00:57:54 -0700 Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by FMSMSX108.amr.corp.intel.com (10.18.124.206) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 19 Mar 2018 00:57:54 -0700 Received: from shsmsx103.ccr.corp.intel.com (10.239.4.69) by fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS) id 14.3.319.2; Mon, 19 Mar 2018 00:57:52 -0700 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.226]) by SHSMSX103.ccr.corp.intel.com ([169.254.4.235]) with mapi id 14.03.0319.002; Mon, 19 Mar 2018 15:57:49 +0800 From: "Xu, Cheng A" To: "dev@orc.apache.org" , "user@orc.apache.org" Subject: ORC double encoding optimization proposal Thread-Topic: ORC double encoding optimization proposal Thread-Index: AdO/V9h8xpTKMOH/QoGAIE8pF2yo4Q== Date: Mon, 19 Mar 2018 07:57:49 +0000 Message-ID: <17B91B6B0D9BBC44A1682DABC201C53552055763@SHSMSX104.ccr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: multipart/alternative; boundary="_000_17B91B6B0D9BBC44A1682DABC201C53552055763SHSMSX104ccrcor_" MIME-Version: 1.0 --_000_17B91B6B0D9BBC44A1682DABC201C53552055763SHSMSX104ccrcor_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi folks, According to our evaluation and analysis by combining the existing work [1]= from Teddy Choi and Owen O'Malley with some new compression codec (e.g. ZS= TD and Brotli), we proposed to prompt FLIP as the default encoding for ORC = double type to move this feature forwards. Currently we have five kinds of supported double encoding optimizations: pl= ain v2, FPC_v1, FPC_v2, FLIP, Split [1]. Equipping with various encoding ca= n handle different use scenarios but it will increase the burdens from end = point users to make a "smart" choice. It's very necessary to have a prefere= nce as the default encoding type. To choose the "best" encoding among those= encodings, two major factors needed to be considered: throughput and space= efficiency. In real world usage, compression is enabled by default. For th= roughput, it can be bottled at either compression part or encoding part. As= for the space efficiency, , encodings like FPC V1, FPC V2 and Split can al= so serve for the goal of space efficiency similar to compression. Our evalu= ation is based on a few artificial and non-artificial data set. For the art= ificial data set which has low cardinality, it should go directly into dict= ionary encoding. So we will choose HEPMASS for the data set to reduce the c= omplexity for analysis first. Now let's go through the evaluating data of t= hose encoding one by one: * Split Benefiting from underlying run length encoding, split can compress the orig= inal data to some extends (Compression=3DNONE, compression ratio=3D47.28%).= If encoding is chose as SPLIT, the compression ratio will not be too much = better even using codec like ZLIB (44.36%). If SPLIT is chosen as the under= lying encoding, compression will have negative impacts on the throughput wi= th limited space efficiency. In summary, Split has 309MB/s in read and 82MB/s in write with 47.28% compr= ession ratio. Data Set Encoding Compression Read Throughput (MB/s) Write Throughput (MB/s) Compression Ratio HEPMASS SPLIT NONE 309.5525998 82.928411 47.28% HEPMASS SPLIT LZO 326.1146497 62.21142279 47.02% HEPMASS SPLIT ZLIB 223.1909329 32.31915222 44.36% HEPMASS SPLIT SNAPPY 340.4255319 82.71405647 46.71% HEPMASS SPLIT ZSTD 295.2710496 76.87687831 43.15% HEPMASS SPLIT BROTLI 174.1496599 77.06201227 43.12% * FLIP Since FLIP itself doesn't have compression functionality, we need to combin= e some compression codec to archive the similar compression functionalities= as Split. On HEPMASS data set, FLIP has 640MB/s in read and 247MB/s in wri= te with 61.59% compression ratio using LZO. And also it has 266MB/s in READ= and 145MB/s in write with 52.39%. In summary, FLIP is a good balance for s= pace efficiency and throughput and user can choose different compression co= dec to choose different goal (high compression or high throughput). Data Set Encoding Compression Read Throughput (MB/s) Write Throughput (MB/s) Compression Ratio HEPMASS FLIP NONE 775.7575758 587.1559742 100.00% HEPMASS FLIP LZO 640 247.1042517 61.59% HEPMASS FLIP ZLIB 272.0510096 19.21056617 52.95% HEPMASS FLIP SNAPPY 595.3488372 261.4913225 59.49% HEPMASS FLIP ZSTD 266.3891779 145.7858797 52.39% HEPMASS FLIP BROTLI 64.84295846 122.3709392 53.11% * FPC V1 FPC V1 is similar to FPC V2 with little difference in endian mode. We choos= e FPC V1 in our analysis. FPC_V1 can also serve for the compression (75.76%= compression ratio when compression=3DNONE). Like Split, extended compressi= on codec does not contribute too much higher compression ratio (66.40%-75%)= while bottleneck the throughput. In summary, FPC is not compression friend= ly (66% - 75%) while throughput is close to FLIP and worse than it with com= pression codec applied. Data Set Encoding Compression Read Throughput (MB/s) Write Throughput (MB/s) Compression Ratio HEPMASS FPC_V1 NONE 469.7247706 324.8731025 75.76% HEPMASS FPC_V1 LZO 474.0740741 310.6796174 75.76% HEPMASS FPC_V1 ZLIB 189.2091648 20.84011761 66.19% HEPMASS FPC_V1 SNAPPY 456.3279857 298.7164583 75.51% HEPMASS FPC_V1 ZSTD 238.1395349 183.7760264 66.35% HEPMASS FPC_V1 BROTLI 53.50052247 165.695796 66.40% * Plain V2 To archive a good balance of compression and throughput, we need to conside= r ZSTD or ZLIB for the compression codec. Then its throughput will be 200 ~= 600 MB/s in READ and 100 ~ 1000MB/s in write. It's also not good enough as= FLIP encoding. Data Set Encoding Compression Read Throughput (MB/s) Write Throughput (MB/s) Compression Ratio HEPMASS PLAIN_V2 NONE 349.2496589 1153.153175 100.00% HEPMASS PLAIN_V2 LZO 571.4285714 236.162366 65.07% HEPMASS PLAIN_V2 ZLIB 304.3995244 10.14544465 54.15% HEPMASS PLAIN_V2 SNAPPY 442.1416235 278.5636613 76.60% HEPMASS PLAIN_V2 ZSTD 242.6540284 144.0630303 56.25% HEPMASS PLAIN_V2 BROTLI 43.99381337 96.67673896 59.33% Other data set, we can observe a similar conclusion as that in HEPMAS data = set. As the result, FLIP is a good choice for compression and throughput ba= lance. It's compression friendly unlike Split and FPC. And it brings good t= hroughput than plain V2 with similar compression ratio. So we suggest to us= e FLIP as the default encoding for double type in ORC. [1] https://github.com/apache/orc/pull/189 Appendix I - Details about our micro-benchmark: Date Set & Compression codec: See Appendix II Data Scale: 1GB Method: Time measured including compression and encoding for write and simi= lar for read path. Appendix II - Benchmark full result Data Set Encoding Compression Read Throughput (MB/s) Write Throughput (MB/s) Compression Ratio LIST_PRICE PLAIN_V2 NONE 641.60401 927.5362492 100.00% LIST_PRICE PLAIN_V2 LZO 513.0260521 267.5026173 47.89% LIST_PRICE PLAIN_V2 ZLIB 318.4079602 12.67515001 37.22% LIST_PRICE PLAIN_V2 SNAPPY 428.8107203 320.4005066 59.19% LIST_PRICE PLAIN_V2 ZSTD 241.9659735 175.1026024 9.82% LIST_PRICE PLAIN_V2 BROTLI 94.39528024 195.5691404 10.68% LIST_PRICE FPC_V1 NONE 538.9473684 343.6241675 57.02% LIST_PRICE FPC_V1 LZO 437.6068376 292.571434 53.83% LIST_PRICE FPC_V1 ZLIB 226.7493357 18.83322333 41.17% LIST_PRICE FPC_V1 SNAPPY 527.8350515 296.6396347 53.70% LIST_PRICE FPC_V1 ZSTD 360.5633803 257.286437 6.33% LIST_PRICE FPC_V1 BROTLI 199.6879875 247.3429998 6.51% LIST_PRICE FPC_V2 NONE 440.6196213 328.6264503 93.41% LIST_PRICE FPC_V2 LZO 330.3225806 158.1223007 64.60% LIST_PRICE FPC_V2 ZLIB 211.745244 12.61332306 46.61% LIST_PRICE FPC_V2 SNAPPY 293.5779817 171.0086872 72.93% LIST_PRICE FPC_V2 ZSTD 171.5817694 118.2994477 19.19% LIST_PRICE FPC_V2 BROTLI 59.631959 94.43010135 24.07% LIST_PRICE FLIP NONE 766.4670659 638.4040019 100.00% LIST_PRICE FLIP LZO 727.2727273 186.1818216 93.93% LIST_PRICE FLIP ZLIB 178.1489214 32.39686216 73.66% LIST_PRICE FLIP SNAPPY 695.6521739 241.5094385 91.63% LIST_PRICE FLIP ZSTD 233.151184 134.5244376 40.08% LIST_PRICE FLIP BROTLI 52.13849287 104.9180347 59.79% LIST_PRICE SPLIT NONE 307.3229292 57.65765873 95.39% LIST_PRICE SPLIT LZO 238.583411 53.34444775 39.24% LIST_PRICE SPLIT ZLIB 189.7702001 10.57326965 30.87% LIST_PRICE SPLIT SNAPPY 214.4053601 54.75935931 48.44% LIST_PRICE SPLIT ZSTD 169.4242224 50.82390406 7.43% LIST_PRICE SPLIT BROTLI 91.62491052 47.95803761 10.12% DISCOUNT_AMT PLAIN_V2 NONE 1080.168776 1286.432185 100.00% DISCOUNT_AMT PLAIN_V2 LZO 782.8746177 504.9309759 18.51% DISCOUNT_AMT PLAIN_V2 ZLIB 527.8350515 46.10120741 13.85% DISCOUNT_AMT PLAIN_V2 SNAPPY 711.1111111 588.5057581 20.41% DISCOUNT_AMT PLAIN_V2 ZSTD 488.5496183 199.8438757 3.50% DISCOUNT_AMT PLAIN_V2 BROTLI 183.6441894 342.7041563 4.29% DISCOUNT_AMT FPC_V1 NONE 625.9168704 335.9580115 21.21% DISCOUNT_AMT FPC_V1 LZO 671.9160105 331.1772377 9.76% DISCOUNT_AMT FPC_V1 ZLIB 487.6190476 84.46057564 8.94% DISCOUNT_AMT FPC_V1 SNAPPY 654.7314578 352.6170865 9.78% DISCOUNT_AMT FPC_V1 ZSTD 561.4035088 331.1772377 1.05% DISCOUNT_AMT FPC_V1 BROTLI 489.4837476 343.6241675 1.21% DISCOUNT_AMT FPC_V2 NONE 641.60401 306.5868321 34.35% DISCOUNT_AMT FPC_V2 LZO 501.9607843 220.4995734 23.46% DISCOUNT_AMT FPC_V2 ZLIB 354.0802213 33.64879137 17.24% DISCOUNT_AMT FPC_V2 SNAPPY 488.5496183 230.8385976 23.48% DISCOUNT_AMT FPC_V2 ZSTD 493.256262 238.8059746 2.57% DISCOUNT_AMT FPC_V2 BROTLI 289.5927602 229.5964168 3.00% DISCOUNT_AMT FLIP NONE 775.7575758 713.0919353 100.00% DISCOUNT_AMT FLIP LZO 439.862543 246.3907649 35.60% DISCOUNT_AMT FLIP ZLIB 296.2962963 15.07478535 24.79% DISCOUNT_AMT FLIP SNAPPY 415.5844156 282.2491784 41.03% DISCOUNT_AMT FLIP ZSTD 201.7336485 184.4380438 9.05% DISCOUNT_AMT FLIP BROTLI 98.23484267 153.3852635 14.78% DISCOUNT_AMT SPLIT NONE 243.1149098 125.860376 42.25% DISCOUNT_AMT SPLIT LZO 250.2443793 112.2807038 1.41% DISCOUNT_AMT SPLIT ZLIB 208.2994304 29.54073445 11.14% DISCOUNT_AMT SPLIT SNAPPY 237.2567192 116.1524523 12.13% DISCOUNT_AMT SPLIT ZSTD 222.4152911 116.0471463 1.02% DISCOUNT_AMT SPLIT BROTLI 172.2745626 107.3825523 1.10% IOT_METER PLAIN_V2 NONE 833.8762215 1024.000019 100.00% IOT_METER PLAIN_V2 LZO 653.0612245 429.5302093 22.77% IOT_METER PLAIN_V2 ZLIB 605.2009456 38.18046305 10.73% IOT_METER PLAIN_V2 SNAPPY 675.4617414 506.9307025 23.20% IOT_METER PLAIN_V2 ZSTD 358.5434174 235.9447049 4.12% IOT_METER PLAIN_V2 BROTLI 161.6161616 227.3534678 6.08% IOT_METER FPC_V1 NONE 563.876652 379.8219655 32.06% IOT_METER FPC_V1 LZO 507.9365079 290.9090963 13.63% IOT_METER FPC_V1 ZLIB 360.5633803 27.16756922 13.35% IOT_METER FPC_V1 SNAPPY 453.0973451 273.5042786 16.99% IOT_METER FPC_V1 ZSTD 388.4673748 263.1038076 1.68% IOT_METER FPC_V1 BROTLI 355.0624133 279.781426 1.78% IOT_METER FPC_V2 NONE 514.0562249 314.8831547 60.06% IOT_METER FPC_V2 LZO 420.3612479 203.6595106 24.12% IOT_METER FPC_V2 ZLIB 339.0728477 28.20315135 16.98% IOT_METER FPC_V2 SNAPPY 425.9567388 225.7495633 25.88% IOT_METER FPC_V2 ZSTD 280.7017544 176.7955834 3.92% IOT_METER FPC_V2 BROTLI 225.7495591 190.7600632 4.24% IOT_METER FLIP NONE 636.8159204 580.498877 100.00% IOT_METER FLIP LZO 325.2858958 223.7762279 66.07% IOT_METER FLIP ZLIB 251.9685039 30.47619104 42.93% IOT_METER FLIP SNAPPY 480.3001876 247.3429998 67.40% IOT_METER FLIP ZSTD 214.2259414 148.1481509 21.73% IOT_METER FLIP BROTLI 72.4801812 126.9841293 25.84% IOT_METER SPLIT NONE 256.5130261 68.26666794 84.95% IOT_METER SPLIT LZO 227.3534636 63.6499266 18.64% IOT_METER SPLIT ZLIB 213.3333333 26.7223387 9.66% IOT_METER SPLIT SNAPPY 230.2158273 62.21142279 20.08% IOT_METER SPLIT ZSTD 184.1726619 58.79651005 2.91% IOT_METER SPLIT BROTLI 125.061065 61.03958149 3.41% NYC_TAXI_DROP_LAT PLAIN_V2 NONE 980.8429119 1122.807038 100.00% NYC_TAXI_DROP_LAT PLAIN_V2 LZO 573.9910314 262.8336805 46.76% NYC_TAXI_DROP_LAT PLAIN_V2 ZLIB 324.0506329 14.90885824 34.20% NYC_TAXI_DROP_LAT PLAIN_V2 SNAPPY 448.3362522 310.3030361 56.42% NYC_TAXI_DROP_LAT PLAIN_V2 ZSTD 236.1623616 141.0468346 34.99% NYC_TAXI_DROP_LAT PLAIN_V2 BROTLI 51.13863364 109.7770175 39.10% NYC_TAXI_DROP_LAT FPC_V1 NONE 445.2173913 169.536427 75.73% NYC_TAXI_DROP_LAT FPC_V1 LZO 418.3006536 250.0000047 75.73% NYC_TAXI_DROP_LAT FPC_V1 ZLIB 189.0694239 13.17075705 51.13% NYC_TAXI_DROP_LAT FPC_V1 SNAPPY 300.1172333 167.6489881 70.49% NYC_TAXI_DROP_LAT FPC_V1 ZSTD 253.968254 188.2352976 53.59% NYC_TAXI_DROP_LAT FPC_V1 BROTLI 58.64833906 167.7588498 54.37% NYC_TAXI_DROP_LAT FPC_V2 NONE 556.5217391 354.0802279 79.94% NYC_TAXI_DROP_LAT FPC_V2 LZO 338.6243386 164.6302281 52.82% NYC_TAXI_DROP_LAT FPC_V2 ZLIB 223.580786 8.611120615 38.58% NYC_TAXI_DROP_LAT FPC_V2 SNAPPY 306.9544365 187.683288 61.75% NYC_TAXI_DROP_LAT FPC_V2 ZSTD 243.3460076 122.6053663 38.80% NYC_TAXI_DROP_LAT FPC_V2 BROTLI 54.66581251 97.37542973 41.73% NYC_TAXI_DROP_LAT FLIP NONE 917.562724 677.2486899 100.00% NYC_TAXI_DROP_LAT FLIP LZO 697.5476839 305.854247 45.04% NYC_TAXI_DROP_LAT FLIP ZLIB 304.0380048 23.90512697 33.75% NYC_TAXI_DROP_LAT FLIP SNAPPY 608.0760095 313.725496 43.76% NYC_TAXI_DROP_LAT FLIP ZSTD 294.2528736 177.4081807 33.23% NYC_TAXI_DROP_LAT FLIP BROTLI 89.01251739 153.7537566 34.35% NYC_TAXI_DROP_LAT SPLIT NONE 391.4373089 131.2147641 38.89% NYC_TAXI_DROP_LAT SPLIT LZO 375.3665689 133.2639275 37.96% NYC_TAXI_DROP_LAT SPLIT ZLIB 235.0780533 32.05208523 29.98% NYC_TAXI_DROP_LAT SPLIT SNAPPY 373.7226277 122.1374069 37.96% NYC_TAXI_DROP_LAT SPLIT ZSTD 280.0875274 115.784715 30.04% NYC_TAXI_DROP_LAT SPLIT BROTLI 99.26328034 112.6760584 30.06% NYC_TAXI_DROP_LONG PLAIN_V2 NONE 1024 1319.587653 100.00% NYC_TAXI_DROP_LONG PLAIN_V2 LZO 536.687631 264.1898914 44.78% NYC_TAXI_DROP_LONG PLAIN_V2 ZLIB 339.9734396 17.12718303 31.48% NYC_TAXI_DROP_LONG PLAIN_V2 SNAPPY 457.960644 315.6596853 52.53% NYC_TAXI_DROP_LONG PLAIN_V2 ZSTD 248.7852284 146.2021729 32.49% NYC_TAXI_DROP_LONG PLAIN_V2 BROTLI 58.62147928 115.9945649 35.83% NYC_TAXI_DROP_LONG FPC_V1 NONE 428.0936455 270.0421991 75.73% NYC_TAXI_DROP_LONG FPC_V1 LZO 383.808096 209.6642136 75.39% NYC_TAXI_DROP_LONG FPC_V1 ZLIB 191.0447761 13.72948647 49.16% NYC_TAXI_DROP_LONG FPC_V1 SNAPPY 276.7567568 162.6429509 69.89% NYC_TAXI_DROP_LONG FPC_V1 ZSTD 213.6894825 177.65441 51.44% NYC_TAXI_DROP_LONG FPC_V1 BROTLI 57.78781038 151.2108711 52.48% NYC_TAXI_DROP_LONG FPC_V2 NONE 514.0562249 115.4192988 77.99% NYC_TAXI_DROP_LONG FPC_V2 LZO 329.8969072 158.9075138 50.99% NYC_TAXI_DROP_LONG FPC_V2 ZLIB 211.9205298 8.547008706 37.01% NYC_TAXI_DROP_LONG FPC_V2 SNAPPY 296.2962963 180.1548239 59.17% NYC_TAXI_DROP_LONG FPC_V2 ZSTD 242.4242424 124.5742116 37.32% NYC_TAXI_DROP_LONG FPC_V2 BROTLI 57.72266065 97.85932904 40.10% NYC_TAXI_DROP_LONG FLIP NONE 930.9090909 668.4073232 100.00% NYC_TAXI_DROP_LONG FLIP LZO 673.6842105 303.3175412 42.93% NYC_TAXI_DROP_LONG FLIP ZLIB 330.749354 23.51428353 31.84% NYC_TAXI_DROP_LONG FLIP SNAPPY 576.5765766 308.0625809 42.19% NYC_TAXI_DROP_LONG FLIP ZSTD 306.9544365 165.3746801 32.09% NYC_TAXI_DROP_LONG FLIP BROTLI 86.98606864 158.4158445 32.77% NYC_TAXI_DROP_LONG SPLIT NONE 411.5755627 128.9672568 39.43% NYC_TAXI_DROP_LONG SPLIT LZO 390.2439024 127.5535649 38.00% NYC_TAXI_DROP_LONG SPLIT ZLIB 245.681382 25.81165606 28.33% NYC_TAXI_DROP_LONG SPLIT SNAPPY 362.6062323 112.8250352 37.70% NYC_TAXI_DROP_LONG SPLIT ZSTD 298.0209546 112.2314795 28.31% NYC_TAXI_DROP_LONG SPLIT BROTLI 105.2631579 110.3448296 28.36% HIGGS PLAIN_V2 NONE 1015.873016 1312.820537 100.00% HIGGS PLAIN_V2 LZO 517.1717172 211.9205337 57.54% HIGGS PLAIN_V2 ZLIB 285.077951 15.71709263 45.34% HIGGS PLAIN_V2 SNAPPY 456.3279857 295.2710551 62.93% HIGGS PLAIN_V2 ZSTD 245.9173871 134.5951655 32.67% HIGGS PLAIN_V2 BROTLI 70.581748 117.9180123 36.06% HIGGS FPC_V1 NONE 442.9065744 320.802011 75.74% HIGGS FPC_V1 LZO 396.8992248 308.8057959 75.75% HIGGS FPC_V1 ZLIB 184.3052556 20.93898289 66.06% HIGGS FPC_V1 SNAPPY 416.9381107 292.9061839 75.50% HIGGS FPC_V1 ZSTD 219.1780822 178.2729838 66.19% HIGGS FPC_V1 BROTLI 53.10101639 159.4022446 66.24% HIGGS FPC_V2 NONE 400 299.7658135 97.31% HIGGS FPC_V2 LZO 277.9587405 140.3508798 74.91% HIGGS FPC_V2 ZLIB 99.30178433 8.00050018 60.80% HIGGS FPC_V2 SNAPPY 252.7147088 147.3805439 82.49% HIGGS FPC_V2 ZSTD 207.6236821 112.1822983 59.45% HIGGS FPC_V2 BROTLI 42.82368685 82.63395893 62.70% HIGGS FLIP NONE 917.562724 661.4987203 100.00% HIGGS FLIP LZO 705.2341598 255.2343019 60.98% HIGGS FLIP ZLIB 274.9731472 19.60333906 51.98% HIGGS FLIP SNAPPY 630.5418719 276.1596599 58.84% HIGGS FLIP ZSTD 216.0337553 150.4997089 51.35% HIGGS FLIP BROTLI 68.04891015 124.3322024 52.23% HIGGS SPLIT NONE 334.2036554 82.66064087 47.40% HIGGS SPLIT LZO 325.6997455 79.28151278 46.83% HIGGS SPLIT ZLIB 202.5316456 28.92655421 41.78% HIGGS SPLIT SNAPPY 309.9273608 78.76923224 46.49% HIGGS SPLIT ZSTD 300.8225617 74.54863272 42.79% HIGGS SPLIT BROTLI 100.6685018 72.45966736 42.61% HEPMASS PLAIN_V2 NONE 349.2496589 1153.153175 100.00% HEPMASS PLAIN_V2 LZO 571.4285714 236.162366 65.07% HEPMASS PLAIN_V2 ZLIB 304.3995244 10.14544465 54.15% HEPMASS PLAIN_V2 SNAPPY 442.1416235 278.5636613 76.60% HEPMASS PLAIN_V2 ZSTD 242.6540284 144.0630303 56.25% HEPMASS PLAIN_V2 BROTLI 43.99381337 96.67673896 59.33% HEPMASS FPC_V1 NONE 469.7247706 324.8731025 75.76% HEPMASS FPC_V1 LZO 474.0740741 310.6796174 75.76% HEPMASS FPC_V1 ZLIB 189.2091648 20.84011761 66.19% HEPMASS FPC_V1 SNAPPY 456.3279857 298.7164583 75.51% HEPMASS FPC_V1 ZSTD 238.1395349 183.7760264 66.35% HEPMASS FPC_V1 BROTLI 53.50052247 165.695796 66.40% HEPMASS FPC_V2 NONE 439.862543 294.252879 97.84% HEPMASS FPC_V2 LZO 277.9587405 138.5281411 76.20% HEPMASS FPC_V2 ZLIB 171.6968478 8.017538515 61.68% HEPMASS FPC_V2 SNAPPY 253.4653465 144.1441468 83.12% HEPMASS FPC_V2 ZSTD 197.5308642 116.5225329 60.38% HEPMASS FPC_V2 BROTLI 40.77731762 78.19181575 63.65% HEPMASS FLIP NONE 775.7575758 587.1559742 100.00% HEPMASS FLIP LZO 640 247.1042517 61.59% HEPMASS FLIP ZLIB 272.0510096 19.21056617 52.95% HEPMASS FLIP SNAPPY 595.3488372 261.4913225 59.49% HEPMASS FLIP ZSTD 266.3891779 145.7858797 52.39% HEPMASS FLIP BROTLI 64.84295846 122.3709392 53.11% HEPMASS SPLIT NONE 309.5525998 82.928411 47.28% HEPMASS SPLIT LZO 326.1146497 62.21142279 47.02% HEPMASS SPLIT ZLIB 223.1909329 32.31915222 44.36% HEPMASS SPLIT SNAPPY 340.4255319 82.71405647 46.71% HEPMASS SPLIT ZSTD 295.2710496 76.87687831 43.15% HEPMASS SPLIT BROTLI 174.1496599 77.06201227 43.12% PHONE PLAIN_V2 NONE 1333.333333 1471.264395 100.00% PHONE PLAIN_V2 LZO 536.687631 280.7017596 49.24% PHONE PLAIN_V2 ZLIB 369.4083694 44.32900515 30.68% PHONE PLAIN_V2 SNAPPY 613.9088729 366.2374889 48.37% PHONE PLAIN_V2 ZSTD 283.8137472 166.7752474 26.98% PHONE PLAIN_V2 BROTLI 77.1549126 156.76669 31.42% PHONE FPC_V1 NONE 449.9121265 314.8831547 102.56% PHONE FPC_V1 LZO 361.0719323 223.3856935 92.48% PHONE FPC_V1 ZLIB 157.8298397 18.87627229 77.92% PHONE FPC_V1 SNAPPY 378.1388479 222.6086998 91.26% PHONE FPC_V1 ZSTD 199.5323461 127.2998532 74.16% PHONE FPC_V1 BROTLI 43.82811162 105.610563 76.80% PHONE FPC_V2 NONE 440.6196213 318.0124283 84.75% PHONE FPC_V2 LZO 426.6666667 176.6735713 78.76% PHONE FPC_V2 ZLIB 193.9393939 33.35939598 70.06% PHONE FPC_V2 SNAPPY 414.2394822 237.91822 77.82% PHONE FPC_V2 ZSTD 231.6742081 150.3229623 68.29% PHONE FPC_V2 BROTLI 54.65414176 135.3065564 71.26% PHONE FLIP NONE 920.8633094 659.7938267 100.00% PHONE FLIP LZO 690.0269542 213.5112634 90.96% PHONE FLIP ZLIB 211.0469909 28.5491251 81.50% PHONE FLIP SNAPPY 618.3574879 239.9250279 89.70% PHONE FLIP ZSTD 268.0628272 130.879348 82.24% PHONE FLIP BROTLI 50.06845296 110.2497867 84.18% PHONE SPLIT NONE 384.3843844 72.87219037 89.04% PHONE SPLIT LZO 272.0510096 61.17084941 45.21% PHONE SPLIT ZLIB 213.6894825 26.77264221 29.37% PHONE SPLIT SNAPPY 276.4578834 64.32160924 45.31% PHONE SPLIT ZSTD 186.8613139 52.54515697 30.37% PHONE SPLIT BROTLI 65.64102564 50.39370173 30.75% Best Regards Ferdinand Xu --_000_17B91B6B0D9BBC44A1682DABC201C53552055763SHSMSX104ccrcor_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi folks,

According to our evaluation and analysis by combinin= g the existing work [1] from Teddy Choi and Owen O'Malley with some new com= pression codec (e.g. ZSTD and Brotli), we proposed to prompt FLIP as the de= fault encoding for ORC double type to move this feature forwards.

Currently we have five kinds of supported double enc= oding optimizations: plain v2, FPC_v1, FPC_v2, FLIP, Split [1]. Equipping w= ith various encoding can handle different use scenarios but it will increas= e the burdens from end point users to make a “smart” choice. It’s very necessary to have a = preference as the default encoding type. To choose the “best” e= ncoding among those encodings, two major factors needed to be considered: t= hroughput and space efficiency. In real world usage, compression is enabled by default. For throughput, it can be bottled at either compres= sion part or encoding part. As for the space efficiency, , encodings like F= PC V1, FPC V2 and Split can also serve for the goal of space efficiency sim= ilar to compression. Our evaluation is based on a few artificial and non-artificial data set. For the artifici= al data set which has low cardinality, it should go directly into dictionar= y encoding. So we will choose HEPMASS for the data set to reduce the comple= xity for analysis first. Now let’s go through the evaluating data of those encoding one by one:

·         Split

Benefiting from underlyi= ng run length encoding, split can compress the original data to some extend= s (Compression=3DNONE, compression ratio=3D47.28%). If encoding is chose as= SPLIT, the compression ratio will not be too much better even using codec like ZLIB (44.36%). If SPLIT is chosen as= the underlying encoding, compression will have negative impacts on the thr= oughput with limited space efficiency.

In summary, Split= has 309MB/s in read and 82MB/s in write with 47.28% compression ratio.  

 

Data Se= t

Encoding

Compression

Read Throughput (MB/s)

Write Throughput (MB/s)

Compression Ratio

HEPMASS=

SPLIT

NONE

309.552599= 8

82.928411<= o:p>

47.28%

HEPMASS=

SPLIT

LZO

326.114649= 7

62.2114227= 9

47.02%

HEPMASS=

SPLIT

ZLIB

223.190932= 9

32.3191522= 2

44.36%

HEPMASS=

SPLIT

SNAPPY

340.425531= 9

82.7140564= 7

46.71%

HEPMASS=

SPLIT

ZSTD

295.271049= 6

76.8768783= 1

43.15%

HEPMASS=

SPLIT

BROTLI

174.149659= 9

77.0620122= 7

43.12%

 

·         FLIP

Since FLIP itself doesn&= #8217;t have compression functionality, we need to combine some compression= codec to archive the similar compression functionalities as Split. On HEPM= ASS data set, FLIP has 640MB/s in read and 247MB/s in write with 61.59% compr= ession ratio using LZO. And also it has 266MB/s in READ and 145MB/s in write with 52.39%. In s= ummary, FLIP is a good balance for space efficiency and throughput and user= can choose different compression codec to choose different goal (high comp= ression or high throughput).

 

Data Se= t

Encoding

Compression

Read Throughput (MB/s)

Write Throughput (MB/s)

Compression Ratio

HEPMASS=

FLIP

NONE

775.757575= 8

587.155974= 2

100.00%

HEPMASS=

FLIP

LZO

640

247.104251= 7

61.59%

HEPMASS=

FLIP

ZLIB

272.051009= 6

19.2105661= 7

52.95%

HEPMASS=

FLIP

SNAPPY

595.348837= 2

261.491322= 5

59.49%

HEPMASS=

FLIP

ZSTD

266.389177= 9

145.785879= 7

52.39%

HEPMASS=

FLIP

BROTLI

64.8429584= 6

122.370939= 2

53.11%

 

·         FPC V1

FPC V1 is similar to FPC= V2 with little difference in endian mode. We choose FPC V1 in our analysis= . FPC_V1 can also serve for the compression (75.76% compression ratio when = compression=3DNONE). Like Split, extended compression codec does not contribute too much higher compression ratio (6= 6.40%-75%) while bottleneck the throughput. In summary, FPC is not compress= ion friendly (66% - 75%) while throughput is close to FLIP and worse than i= t with compression codec applied.

 

Data Se= t

Encoding

Compression

Read Throughput (MB/s)

Write Throughput (MB/s)

Compression Ratio

HEPMASS=

FPC_V1

NONE

469.724770= 6

324.873102= 5

75.76%

HEPMASS=

FPC_V1

LZO

474.074074= 1

310.679617= 4

75.76%

HEPMASS=

FPC_V1

ZLIB

189.209164= 8

20.8401176= 1

66.19%

HEPMASS=

FPC_V1

SNAPPY

456.327985= 7

298.716458= 3

75.51%

HEPMASS=

FPC_V1

ZSTD

238.139534= 9

183.776026= 4

66.35%

HEPMASS=

FPC_V1

BROTLI

53.5005224= 7

165.695796=

66.40%

 

·         Plain V2

To archive a good balanc= e of compression and throughput, we need to consider ZSTD or ZLIB for the c= ompression codec. Then its throughput will be 200 ~ 600 MB/s in READ and 10= 0 ~ 1000MB/s in write. It’s also not good enough as FLIP encoding.

 

Data Se= t

Encoding

Compression

Read Throughput (MB/s)

Write Throughput (MB/s)

Compression Ratio

HEPMASS=

PLAIN_V2

NONE

349.249658= 9

1153.15317= 5

100.00%

HEPMASS=

PLAIN_V2

LZO

571.428571= 4

236.162366=

65.07%

HEPMASS=

PLAIN_V2

ZLIB

304.399524= 4

10.1454446= 5

54.15%

HEPMASS=

PLAIN_V2

SNAPPY

442.141623= 5

278.563661= 3

76.60%

HEPMASS=

PLAIN_V2

ZSTD

242.654028= 4

144.063030= 3

56.25%

HEPMASS=

PLAIN_V2

BROTLI

43.9938133= 7

96.6767389= 6

59.33%

 

Other data set, we can o= bserve a similar conclusion as that in HEPMAS data set. As the result, FLIP= is a good choice for compression and throughput balance. It’s compre= ssion friendly unlike Split and FPC. And it brings good throughput than plain V2 with similar compression ratio. So we= suggest to use FLIP as the default encoding for double type in ORC.

 

[1] https://github.com/apache/orc/pull/189

 

Appendix I - Details about our micro-benchmark:=

Date Set & Compression codec: See Appendix II

Data Scale: 1GB

Method: Time measured including compression and enco= ding for write and similar for read path.

 

Appendix II – Benchmark full result=

Data Set

Encoding

Compression

Read Throughput (MB/s)

Write Throughput (MB/s)

Compression Ratio

LIST_PRICE

PLAIN_V2

NONE

641.60401

927.5362492

100.00%

LIST_PRICE

PLAIN_V2

LZO

513.026052= 1

267.502617= 3

47.89%

LIST_PRICE

PLAIN_V2

ZLIB

318.407960= 2

12.6751500= 1

37.22%

LIST_PRICE

PLAIN_V2

SNAPPY

428.810720= 3

320.4005066

59.19%

LIST_PRICE

PLAIN_V2

ZSTD

241.965973= 5

175.102602= 4

9.82%

LIST_PRICE

PLAIN_V2

BROTLI

94.3952802= 4

195.569140= 4

10.68%

LIST_PRICE

FPC_V1

NONE

538.9473684

343.6241675

57.02%

LIST_PRICE

FPC_V1

LZO

437.606837= 6

292.571434=

53.83%

LIST_PRICE

FPC_V1

ZLIB

226.749335= 7

18.8332233= 3

41.17%

LIST_PRICE

FPC_V1

SNAPPY

527.8350515

296.639634= 7

53.70%

LIST_PRICE

FPC_V1

ZSTD

360.563380= 3

257.286437=

6.33%

LIST_PRICE

FPC_V1

BROTLI

199.687987= 5

247.342999= 8

6.51%

LIST_PRICE

FPC_V2

NONE

440.619621= 3

328.6264503

93.41%

LIST_PRICE

FPC_V2

LZO

330.322580= 6

158.122300= 7

64.60%

LIST_PRICE

FPC_V2

ZLIB

211.745244=

12.6133230= 6

46.61%

LIST_PRICE

FPC_V2

SNAPPY

293.577981= 7

171.008687= 2

72.93%

LIST_PRICE

FPC_V2

ZSTD

171.581769= 4

118.299447= 7

19.19%

LIST_PRICE

FPC_V2

BROTLI

59.631959<= o:p>

94.4301013= 5

24.07%

LIST_PRICE

FLIP

NONE

766.4670659

638.4040019

100.00%

LIST_PRICE

FLIP

LZO

727.2727273

186.181821= 6

93.93%

LIST_PRICE

FLIP

ZLIB

178.148921= 4

32.3968621= 6

73.66%

LIST_PRICE

FLIP

SNAPPY

695.6521739

241.509438= 5

91.63%

LIST_PRICE

FLIP

ZSTD

233.151184=

134.524437= 6

40.08%

LIST_PRICE

FLIP

BROTLI

52.1384928= 7

104.918034= 7

59.79%

LIST_PRICE

SPLIT

NONE

307.322929= 2

57.6576587= 3

95.39%

LIST_PRICE

SPLIT

LZO

238.583411=

53.3444477= 5

39.24%

LIST_PRICE

SPLIT

ZLIB

189.770200= 1

10.5732696= 5

30.87%

LIST_PRICE

SPLIT

SNAPPY

214.405360= 1

54.7593593= 1

48.44%

LIST_PRICE

SPLIT

ZSTD

169.424222= 4

50.8239040= 6

7.43%

LIST_PRICE

SPLIT

BROTLI

91.6249105= 2

47.9580376= 1

10.12%

DISCOUNT_AMT

PLAIN_V2

NONE

1080.168776

1286.432185

100.00%

DISCOUNT_AMT

PLAIN_V2

LZO

782.8746177

504.9309759

18.51%

DISCOUNT_AMT

PLAIN_V2

ZLIB

527.8350515

46.1012074= 1

13.85%

DISCOUNT_AMT

PLAIN_V2

SNAPPY

711.1111111

588.5057581

20.41%

DISCOUNT_AMT

PLAIN_V2

ZSTD

488.549618= 3

199.843875= 7

3.50%

DISCOUNT_AMT

PLAIN_V2

BROTLI

183.644189= 4

342.7041563

4.29%

DISCOUNT_AMT

FPC_V1

NONE

625.9168704

335.9580115

21.21%

DISCOUNT_AMT

FPC_V1

LZO

671.9160105

331.1772377

9.76%

DISCOUNT_AMT

FPC_V1

ZLIB

487.619047= 6

84.4605756= 4

8.94%

DISCOUNT_AMT

FPC_V1

SNAPPY

654.7314578

352.6170865

9.78%

DISCOUNT_AMT

FPC_V1

ZSTD

561.4035088

331.1772377

1.05%

DISCOUNT_AMT

FPC_V1

BROTLI

489.483747= 6

343.6241675

1.21%

DISCOUNT_AMT

FPC_V2

NONE

641.60401

306.5868321

34.35%

DISCOUNT_AMT

FPC_V2

LZO

501.960784= 3

220.499573= 4

23.46%

DISCOUNT_AMT

FPC_V2

ZLIB

354.080221= 3

33.6487913= 7

17.24%

DISCOUNT_AMT

FPC_V2

SNAPPY

488.549618= 3

230.838597= 6

23.48%

DISCOUNT_AMT

FPC_V2

ZSTD

493.256262=

238.805974= 6

2.57%

DISCOUNT_AMT

FPC_V2

BROTLI

289.592760= 2

229.596416= 8

3.00%

DISCOUNT_AMT

FLIP

NONE

775.7575758

713.0919353

100.00%

DISCOUNT_AMT

FLIP

LZO

439.862543=

246.390764= 9

35.60%

DISCOUNT_AMT

FLIP

ZLIB

296.296296= 3

15.0747853= 5

24.79%

DISCOUNT_AMT

FLIP

SNAPPY

415.584415= 6

282.249178= 4

41.03%

DISCOUNT_AMT

FLIP

ZSTD

201.733648= 5

184.438043= 8

9.05%

DISCOUNT_AMT

FLIP

BROTLI

98.2348426= 7

153.385263= 5

14.78%

DISCOUNT_AMT

SPLIT

NONE

243.114909= 8

125.860376=

42.25%

DISCOUNT_AMT

SPLIT

LZO

250.244379= 3

112.280703= 8

1.41%

DISCOUNT_AMT

SPLIT

ZLIB

208.299430= 4

29.5407344= 5

11.14%

DISCOUNT_AMT

SPLIT

SNAPPY

237.256719= 2

116.152452= 3

12.13%

DISCOUNT_AMT

SPLIT

ZSTD

222.415291= 1

116.047146= 3

1.02%

DISCOUNT_AMT

SPLIT

BROTLI

172.274562= 6

107.382552= 3

1.10%

IOT_METER

PLAIN_V2

NONE

833.8762215

1024.000019

100.00%

IOT_METER

PLAIN_V2

LZO

653.0612245

429.5302093

22.77%

IOT_METER

PLAIN_V2

ZLIB

605.2009456

38.1804630= 5

10.73%

IOT_METER

PLAIN_V2

SNAPPY

675.4617414

506.9307025

23.20%

IOT_METER

PLAIN_V2

ZSTD

358.543417= 4

235.944704= 9

4.12%

IOT_METER

PLAIN_V2

BROTLI

161.616161= 6

227.353467= 8

6.08%

IOT_METER

FPC_V1

NONE

563.876652

379.8219655

32.06%

IOT_METER

FPC_V1

LZO

507.936507= 9

290.909096= 3

13.63%

IOT_METER

FPC_V1

ZLIB

360.563380= 3

27.1675692= 2

13.35%

IOT_METER

FPC_V1

SNAPPY

453.097345= 1

273.504278= 6

16.99%

IOT_METER

FPC_V1

ZSTD

388.467374= 8

263.103807= 6

1.68%

IOT_METER

FPC_V1

BROTLI

355.062413= 3

279.781426=

1.78%

IOT_METER

FPC_V2

NONE

514.056224= 9

314.8831547

60.06%

IOT_METER

FPC_V2

LZO

420.361247= 9

203.659510= 6

24.12%

IOT_METER

FPC_V2

ZLIB

339.072847= 7

28.2031513= 5

16.98%

IOT_METER

FPC_V2

SNAPPY

425.956738= 8

225.749563= 3

25.88%

IOT_METER

FPC_V2

ZSTD

280.701754= 4

176.795583= 4

3.92%

IOT_METER

FPC_V2

BROTLI

225.749559= 1

190.760063= 2

4.24%

IOT_METER

FLIP

NONE

636.8159204

580.498877

100.00%

IOT_METER

FLIP

LZO

325.285895= 8

223.776227= 9

66.07%

IOT_METER

FLIP

ZLIB

251.968503= 9

30.4761910= 4

42.93%

IOT_METER

FLIP

SNAPPY

480.300187= 6

247.342999= 8

67.40%

IOT_METER

FLIP

ZSTD

214.225941= 4

148.148150= 9

21.73%

IOT_METER

FLIP

BROTLI

72.4801812=

126.984129= 3

25.84%

IOT_METER

SPLIT

NONE

256.513026= 1

68.2666679= 4

84.95%

IOT_METER

SPLIT

LZO

227.353463= 6

63.6499266=

18.64%

IOT_METER

SPLIT

ZLIB

213.333333= 3

26.7223387=

9.66%

IOT_METER

SPLIT

SNAPPY

230.215827= 3

62.2114227= 9

20.08%

IOT_METER

SPLIT

ZSTD

184.172661= 9

58.7965100= 5

2.91%

IOT_METER

SPLIT

BROTLI

125.061065=

61.0395814= 9

3.41%

NYC_TAXI_DROP_LAT

PLAIN_V2

NONE

980.8429119

1122.807038

100.00%

NYC_TAXI_DROP_LAT

PLAIN_V2

LZO

573.9910314

262.833680= 5

46.76%

NYC_TAXI_DROP_LAT

PLAIN_V2

ZLIB

324.050632= 9

14.9088582= 4

34.20%

NYC_TAXI_DROP_LAT

PLAIN_V2

SNAPPY

448.336252= 2

310.3030361

56.42%

NYC_TAXI_DROP_LAT

PLAIN_V2

ZSTD

236.162361= 6

141.046834= 6

34.99%

NYC_TAXI_DROP_LAT

PLAIN_V2

BROTLI

51.1386336= 4

109.777017= 5

39.10%

NYC_TAXI_DROP_LAT

FPC_V1

NONE

445.217391= 3

169.536427=

75.73%

NYC_TAXI_DROP_LAT

FPC_V1

LZO

418.300653= 6

250.000004= 7

75.73%

NYC_TAXI_DROP_LAT

FPC_V1

ZLIB

189.069423= 9

13.1707570= 5

51.13%

NYC_TAXI_DROP_LAT

FPC_V1

SNAPPY

300.117233= 3

167.648988= 1

70.49%

NYC_TAXI_DROP_LAT

FPC_V1

ZSTD

253.968254=

188.235297= 6

53.59%

NYC_TAXI_DROP_LAT

FPC_V1

BROTLI

58.6483390= 6

167.758849= 8

54.37%

NYC_TAXI_DROP_LAT

FPC_V2

NONE

556.5217391

354.0802279

79.94%

NYC_TAXI_DROP_LAT

FPC_V2

LZO

338.624338= 6

164.630228= 1

52.82%

NYC_TAXI_DROP_LAT

FPC_V2

ZLIB

223.580786=

8.61112061= 5

38.58%

NYC_TAXI_DROP_LAT

FPC_V2

SNAPPY

306.954436= 5

187.683288=

61.75%

NYC_TAXI_DROP_LAT

FPC_V2

ZSTD

243.346007= 6

122.605366= 3

38.80%

NYC_TAXI_DROP_LAT

FPC_V2

BROTLI

54.6658125= 1

97.3754297= 3

41.73%

NYC_TAXI_DROP_LAT

FLIP

NONE

917.562724

677.2486899

100.00%

NYC_TAXI_DROP_LAT

FLIP

LZO

697.5476839

305.854247

45.04%

NYC_TAXI_DROP_LAT

FLIP

ZLIB

304.038004= 8

23.9051269= 7

33.75%

NYC_TAXI_DROP_LAT

FLIP

SNAPPY

608.0760095

313.725496

43.76%

NYC_TAXI_DROP_LAT

FLIP

ZSTD

294.252873= 6

177.408180= 7

33.23%

NYC_TAXI_DROP_LAT

FLIP

BROTLI

89.0125173= 9

153.753756= 6

34.35%

NYC_TAXI_DROP_LAT

SPLIT

NONE

391.437308= 9

131.214764= 1

38.89%

NYC_TAXI_DROP_LAT

SPLIT

LZO

375.366568= 9

133.263927= 5

37.96%

NYC_TAXI_DROP_LAT

SPLIT

ZLIB

235.078053= 3

32.0520852= 3

29.98%

NYC_TAXI_DROP_LAT

SPLIT

SNAPPY

373.722627= 7

122.137406= 9

37.96%

NYC_TAXI_DROP_LAT

SPLIT

ZSTD

280.087527= 4

115.784715=

30.04%

NYC_TAXI_DROP_LAT

SPLIT

BROTLI

99.2632803= 4

112.676058= 4

30.06%

NYC_TAXI_DROP_LONG

PLAIN_V2

NONE

1024

1319.587653

100.00%

NYC_TAXI_DROP_LONG

PLAIN_V2

LZO

536.687631

264.189891= 4

44.78%

NYC_TAXI_DROP_LONG

PLAIN_V2

ZLIB

339.973439= 6

17.1271830= 3

31.48%

NYC_TAXI_DROP_LONG

PLAIN_V2

SNAPPY

457.960644=

315.6596853

52.53%

NYC_TAXI_DROP_LONG

PLAIN_V2

ZSTD

248.785228= 4

146.202172= 9

32.49%

NYC_TAXI_DROP_LONG

PLAIN_V2

BROTLI

58.6214792= 8

115.994564= 9

35.83%

NYC_TAXI_DROP_LONG

FPC_V1

NONE

428.093645= 5

270.042199= 1

75.73%

NYC_TAXI_DROP_LONG

FPC_V1

LZO

383.808096=

209.664213= 6

75.39%

NYC_TAXI_DROP_LONG

FPC_V1

ZLIB

191.044776= 1

13.7294864= 7

49.16%

NYC_TAXI_DROP_LONG

FPC_V1

SNAPPY

276.756756= 8

162.642950= 9

69.89%

NYC_TAXI_DROP_LONG

FPC_V1

ZSTD

213.689482= 5

177.65441<= o:p>

51.44%

NYC_TAXI_DROP_LONG

FPC_V1

BROTLI

57.7878103= 8

151.210871= 1

52.48%

NYC_TAXI_DROP_LONG

FPC_V2

NONE

514.056224= 9

115.419298= 8

77.99%

NYC_TAXI_DROP_LONG

FPC_V2

LZO

329.896907= 2

158.907513= 8

50.99%

NYC_TAXI_DROP_LONG

FPC_V2

ZLIB

211.920529= 8

8.54700870= 6

37.01%

NYC_TAXI_DROP_LONG

FPC_V2

SNAPPY

296.296296= 3

180.154823= 9

59.17%

NYC_TAXI_DROP_LONG

FPC_V2

ZSTD

242.424242= 4

124.574211= 6

37.32%

NYC_TAXI_DROP_LONG

FPC_V2

BROTLI

57.7226606= 5

97.8593290= 4

40.10%

NYC_TAXI_DROP_LONG

FLIP

NONE

930.9090909

668.4073232

100.00%

NYC_TAXI_DROP_LONG

FLIP

LZO

673.6842105

303.3175412

42.93%

NYC_TAXI_DROP_LONG

FLIP

ZLIB

330.749354=

23.5142835= 3

31.84%

NYC_TAXI_DROP_LONG

FLIP

SNAPPY

576.5765766

308.0625809

42.19%

NYC_TAXI_DROP_LONG

FLIP

ZSTD

306.954436= 5

165.374680= 1

32.09%

NYC_TAXI_DROP_LONG

FLIP

BROTLI

86.9860686= 4

158.415844= 5

32.77%

NYC_TAXI_DROP_LONG

SPLIT

NONE

411.575562= 7

128.967256= 8

39.43%

NYC_TAXI_DROP_LONG

SPLIT

LZO

390.243902= 4

127.553564= 9

38.00%

NYC_TAXI_DROP_LONG

SPLIT

ZLIB

245.681382=

25.8116560= 6

28.33%

NYC_TAXI_DROP_LONG

SPLIT

SNAPPY

362.606232= 3

112.825035= 2

37.70%

NYC_TAXI_DROP_LONG

SPLIT

ZSTD

298.020954= 6

112.231479= 5

28.31%

NYC_TAXI_DROP_LONG

SPLIT

BROTLI

105.263157= 9

110.344829= 6

28.36%

HIGGS

PLAIN_V2

NONE

1015.873016

1312.820537

100.00%

HIGGS

PLAIN_V2

LZO

517.1717172

211.920533= 7

57.54%

HIGGS

PLAIN_V2

ZLIB

285.077951=

15.7170926= 3

45.34%

HIGGS

PLAIN_V2

SNAPPY

456.327985= 7

295.271055= 1

62.93%

HIGGS

PLAIN_V2

ZSTD

= 245.9173871

= 134.5951655

= 32.67%

HIGGS

PLAIN_V2

BROTLI

= 70.581748

= 117.9180123

= 36.06%

HIGGS

FPC_V1

NONE

442.906574= 4

320.802011

75.74%

HIGGS

FPC_V1

LZO

396.899224= 8

308.8057959

75.75%

HIGGS

FPC_V1

ZLIB

184.305255= 6

20.9389828= 9

66.06%

HIGGS

FPC_V1

SNAPPY

416.938110= 7

292.906183= 9

75.50%

HIGGS

FPC_V1

ZSTD

= 219.1780822

= 178.2729838

= 66.19%

HIGGS

FPC_V1

BROTLI

= 53.10101639

= 159.4022446

= 66.24%

HIGGS

FPC_V2

NONE

400

299.7658135

97.31%

HIGGS

FPC_V2

LZO

277.958740= 5

140.350879= 8

74.91%

HIGGS

FPC_V2

ZLIB

99.3017843= 3

8.00050018=

60.80%

HIGGS

FPC_V2

SNAPPY

252.714708= 8

147.380543= 9

82.49%

HIGGS

FPC_V2

ZSTD

= 207.6236821

= 112.1822983

= 59.45%

HIGGS

FPC_V2

BROTLI

= 42.82368685

= 82.63395893

= 62.70%

HIGGS

FLIP

NONE

917.562724

661.4987203

100.00%

HIGGS

FLIP

LZO

705.2341598

255.234301= 9

60.98%

HIGGS

FLIP

ZLIB

274.973147= 2

19.6033390= 6

51.98%

HIGGS

FLIP

SNAPPY

630.5418719

276.159659= 9

58.84%

HIGGS

FLIP

ZSTD

= 216.0337553

= 150.4997089

= 51.35%

HIGGS

FLIP

BROTLI

= 68.04891015

= 124.3322024

= 52.23%

HIGGS

SPLIT

NONE

334.203655= 4

82.6606408= 7

47.40%

HIGGS

SPLIT

LZO

325.699745= 5

79.2815127= 8

46.83%

HIGGS

SPLIT

ZLIB

202.531645= 6

28.9265542= 1

41.78%

HIGGS

SPLIT

SNAPPY

309.927360= 8

78.7692322= 4

46.49%

HIGGS

SPLIT

ZSTD

= 300.8225617

= 74.54863272

= 42.79%

HIGGS

SPLIT

BROTLI

= 100.6685018

= 72.45966736

= 42.61%

HEPMASS

PLAIN_V2

NONE

349.249658= 9

1153.153175

100.00%

HEPMASS

PLAIN_V2

LZO

571.4285714

236.162366=

65.07%

HEPMASS

PLAIN_V2

ZLIB

304.399524= 4

10.1454446= 5

54.15%

HEPMASS

PLAIN_V2

SNAPPY

442.141623= 5

278.563661= 3

76.60%

HEPMASS

PLAIN_V2

ZSTD

= 242.6540284

= 144.0630303

56.25%

HEPMASS

PLAIN_V2

BROTLI

= 43.99381337

= 96.67673896

59.33%

HEPMASS

FPC_V1

NONE

469.724770= 6

324.8731025

75.76%

HEPMASS

FPC_V1

LZO

474.074074= 1

310.6796174

75.76%

HEPMASS

FPC_V1

ZLIB

189.209164= 8

20.8401176= 1

66.19%

HEPMASS

FPC_V1

SNAPPY

456.327985= 7

298.7164583

75.51%

HEPMASS

FPC_V1

ZSTD

= 238.1395349

= 183.7760264

66.35%

HEPMASS

FPC_V1

BROTLI

= 53.50052247

= 165.695796

66.40%

HEPMASS

FPC_V2

NONE

439.862543=

294.252879=

97.84%

HEPMASS

FPC_V2

LZO

277.958740= 5

138.528141= 1

76.20%

HEPMASS

FPC_V2

ZLIB

171.696847= 8

8.01753851= 5

61.68%

HEPMASS

FPC_V2

SNAPPY

253.465346= 5

144.144146= 8

83.12%

HEPMASS

FPC_V2

ZSTD

= 197.5308642

= 116.5225329

60.38%

HEPMASS

FPC_V2

BROTLI

= 40.77731762

= 78.19181575

63.65%

HEPMASS

FLIP

NONE

775.7575758

587.1559742

100.00%

HEPMASS

FLIP

LZO

640

247.104251= 7

61.59%

HEPMASS

FLIP

ZLIB

272.051009= 6

19.2105661= 7

52.95%

HEPMASS

FLIP

SNAPPY

595.3488372

261.491322= 5

59.49%

HEPMASS

FLIP

ZSTD

= 266.3891779

= 145.7858797

52.39%

HEPMASS

FLIP

BROTLI

= 64.84295846

= 122.3709392

53.11%

HEPMASS

SPLIT

NONE

309.552599= 8

82.928411<= o:p>

47.28%

HEPMASS

SPLIT

LZO

326.114649= 7

62.2114227= 9

47.02%

HEPMASS

SPLIT

ZLIB

223.190932= 9

32.3191522= 2

44.36%

HEPMASS

SPLIT

SNAPPY

340.425531= 9

82.7140564= 7

46.71%

HEPMASS

SPLIT

ZSTD

= 295.2710496

= 76.87687831

43.15%

HEPMASS

SPLIT

BROTLI

= 174.1496599

= 77.06201227

43.12%

PHONE

PLAIN_V2

NONE

1333.333333

1471.264395

100.00%

PHONE

PLAIN_V2

LZO

536.687631

280.701759= 6

49.24%

PHONE

PLAIN_V2

ZLIB

369.408369= 4

44.3290051= 5

30.68%

PHONE

PLAIN_V2

SNAPPY

613.9088729

366.2374889

48.37%

PHONE

PLAIN_V2

ZSTD

283.813747= 2

166.775247= 4

26.98%

PHONE

PLAIN_V2

BROTLI

77.1549126=

156.76669<= o:p>

31.42%

PHONE

FPC_V1

NONE

449.912126= 5

314.8831547

102.56%

PHONE

FPC_V1

LZO

361.071932= 3

223.385693= 5

92.48%

PHONE

FPC_V1

ZLIB

157.829839= 7

18.8762722= 9

77.92%

PHONE

FPC_V1

SNAPPY

378.138847= 9

222.608699= 8

91.26%

PHONE

FPC_V1

ZSTD

199.532346= 1

127.299853= 2

74.16%

PHONE

FPC_V1

BROTLI

43.8281116= 2

105.610563=

76.80%

PHONE

FPC_V2

NONE

440.619621= 3

318.0124283

84.75%

PHONE

FPC_V2

LZO

426.666666= 7

176.673571= 3

78.76%

PHONE

FPC_V2

ZLIB

193.939393= 9

33.3593959= 8

70.06%

PHONE

FPC_V2

SNAPPY

414.239482= 2

237.91822<= o:p>

77.82%

PHONE

FPC_V2

ZSTD

231.674208= 1

150.322962= 3

68.29%

PHONE

FPC_V2

BROTLI

54.6541417= 6

135.306556= 4

71.26%

PHONE

FLIP

NONE

920.8633094

659.7938267

100.00%

PHONE

FLIP

LZO

690.0269542

213.511263= 4

90.96%

PHONE

FLIP

ZLIB

211.046990= 9

28.5491251=

81.50%

PHONE

FLIP

SNAPPY

618.3574879

239.925027= 9

89.70%

PHONE

FLIP

ZSTD

268.062827= 2

130.879348=

82.24%

PHONE

FLIP

BROTLI

50.0684529= 6

110.249786= 7

84.18%

PHONE

SPLIT

NONE

384.384384= 4

72.8721903= 7

89.04%

PHONE

SPLIT

LZO

272.051009= 6

61.1708494= 1

45.21%

PHONE

SPLIT

ZLIB

213.689482= 5

26.7726422= 1

29.37%

PHONE

SPLIT

SNAPPY

276.457883= 4

64.3216092= 4

45.31%

PHONE

SPLIT

ZSTD

186.861313= 9

52.5451569= 7

30.37%

PHONE

SPLIT

BROTLI

65.6410256= 4

50.3937017= 3

30.75%

 

 

Best Regards

Ferdinand Xu

 

--_000_17B91B6B0D9BBC44A1682DABC201C53552055763SHSMSX104ccrcor_--