Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B171D200BCC for ; Tue, 29 Nov 2016 08:24:03 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id B00A8160B15; Tue, 29 Nov 2016 07:24:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 06758160B05 for ; Tue, 29 Nov 2016 08:24:02 +0100 (CET) Received: (qmail 91133 invoked by uid 500); 29 Nov 2016 07:24:01 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 91124 invoked by uid 99); 29 Nov 2016 07:24:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 07:24:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 534AB1803A6 for ; Tue, 29 Nov 2016 07:24:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id wD0mVIJPedEG for ; Tue, 29 Nov 2016 07:24:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 74E165F39C for ; Tue, 29 Nov 2016 07:23:59 +0000 (UTC) Received: (qmail 89216 invoked by uid 99); 29 Nov 2016 07:23:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Nov 2016 07:23:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 611912C03E0 for ; Tue, 29 Nov 2016 07:23:58 +0000 (UTC) Date: Tue, 29 Nov 2016 07:23:58 +0000 (UTC) From: "suo tong (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CARBONDATA-464) Too many tiems GC occurs in query if we increase the blocklet size MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 29 Nov 2016 07:24:03 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] suo tong updated CARBONDATA-464: -------------------------------- Description: parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks, which can be independently uncompressed and processed. In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes. was: parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks, which can be independently uncompressed and processed. In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing > Too many tiems GC occurs in query if we increase the blocklet size > ------------------------------------------------------------------ > > Key: CARBONDATA-464 > URL: https://issues.apache.org/jira/browse/CARBONDATA-464 > Project: CarbonData > Issue Type: Sub-task > Reporter: suo tong > > parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks, which can be independently uncompressed and processed. > In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)