From commits-return-12283-archive-asf-public=cust-asf.ponee.io@hudi.apache.org Fri Feb 28 14:12:45 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 15AC3180636 for ; Fri, 28 Feb 2020 15:12:44 +0100 (CET) Received: (qmail 89700 invoked by uid 500); 28 Feb 2020 14:12:44 -0000 Mailing-List: contact commits-help@hudi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hudi.apache.org Delivered-To: mailing list commits@hudi.apache.org Received: (qmail 89691 invoked by uid 99); 28 Feb 2020 14:12:44 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Feb 2020 14:12:44 +0000 From: GitBox To: commits@hudi.apache.org Subject: [GitHub] [incubator-hudi] lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet Message-ID: <158289916432.27773.6649196086087227399.gitbox@gitbox.apache.org> References: In-Reply-To: Date: Fri, 28 Feb 2020 14:12:44 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit lamber-ken edited a comment on issue #1253: [HUDI-558] Introduce ability to compress bloom filters while storing in parquet URL: https://github.com/apache/incubator-hudi/pull/1253#issuecomment-592397157 Hi @bvaradar, the idea of compressing strings is great, just thinking: Call time line will be: `byte[]` -> `base64 String` -> `gzip stream` -> `base64 String` ![image](https://user-images.githubusercontent.com/20113411/75521469-cc4a8a80-5a42-11ea-8d92-59b9f845d2d6.png) IMO, we can use gzip compress `byte[]` data directly, like: ![image](https://user-images.githubusercontent.com/20113411/75521938-bb4e4900-5a43-11ea-9399-8a8eeae72692.png) ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org With regards, Apache Git Services