Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7EDCE200BE2 for ; Thu, 15 Dec 2016 20:24:30 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 7D7F4160B2D; Thu, 15 Dec 2016 19:24:30 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id C4BB6160B13 for ; Thu, 15 Dec 2016 20:24:29 +0100 (CET) Received: (qmail 19563 invoked by uid 500); 15 Dec 2016 19:24:29 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 19553 invoked by uid 99); 15 Dec 2016 19:24:29 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Dec 2016 19:24:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 7ED7CC0258 for ; Thu, 15 Dec 2016 19:24:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id CSVUzt5TfUTT for ; Thu, 15 Dec 2016 19:24:27 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 9121C5FB5D for ; Thu, 15 Dec 2016 19:24:26 +0000 (UTC) Received: (qmail 18493 invoked by uid 99); 15 Dec 2016 19:23:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 15 Dec 2016 19:23:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8AD4D2C03E5 for ; Thu, 15 Dec 2016 19:23:58 +0000 (UTC) Date: Thu, 15 Dec 2016 19:23:58 +0000 (UTC) From: "Jihong MA (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (CARBONDATA-464) Big GC occurs frequently when Carbon's blocklet size is enlarged from the default MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 15 Dec 2016 19:24:30 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated CARBONDATA-464: --------------------------------- Description: parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into column chunks in columnar format, and each column trunk consists of many pages, the page(default size 1 MB) can be independently uncompressed and processed. In case of current carbon since we use larger blocklet, it requires larger processing memory as well, as it decompresses all projected column chunks within a blocklet, which consumes big amount of memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes. was: parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into column chunks in columnar format, and each column trunk consists of many pages, the page(default size 1 MB) can be independently uncompressed and processed. In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes. Summary: Big GC occurs frequently when Carbon's blocklet size is enlarged from the default (was: Too many tiems GC occurs in query if we increase the blocklet size) > Big GC occurs frequently when Carbon's blocklet size is enlarged from the default > --------------------------------------------------------------------------------- > > Key: CARBONDATA-464 > URL: https://issues.apache.org/jira/browse/CARBONDATA-464 > Project: CarbonData > Issue Type: Sub-task > Reporter: suo tong > > parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into column chunks in columnar format, and each column trunk consists of many pages, the page(default size 1 MB) can be independently uncompressed and processed. > In case of current carbon since we use larger blocklet, it requires larger processing memory as well, as it decompresses all projected column chunks within a blocklet, which consumes big amount of memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)