Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@carbondata.incubator.apache.org
Date: Thu, 15 Dec 2016 19:23:58 +0000 (UTC)
From: "Jihong MA (JIRA)" <jira@apache.org>
To: issues@carbondata.incubator.apache.org
Message-ID: <JIRA.13023992.1480403663000.532171.1481829838566@Atlassian.JIRA>
In-Reply-To: <JIRA.13023992.1480403663000@Atlassian.JIRA>
References: <JIRA.13023992.1480403663000@Atlassian.JIRA> <JIRA.13023992.1480403663751@arcas>
Subject: [jira] [Updated] (CARBONDATA-464) Big GC occurs frequently when
 Carbon's blocklet size is enlarged from the default
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 15 Dec 2016 19:24:30 -0000


     [ https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jihong MA updated CARBONDATA-464:
---------------------------------
    Description: 
parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into column chunks in columnar format, and each column trunk consists of many pages, the page(default size 1 MB) can be independently uncompressed and processed.
In case of current carbon since we use larger blocklet, it requires larger processing memory as well, as it decompresses all projected column chunks within a blocklet, which consumes big amount of memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes.

  was:
parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into column chunks in columnar format, and each column trunk consists of many pages, the page(default size 1 MB) can be independently uncompressed and processed.
In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes.

        Summary: Big GC occurs frequently when Carbon's blocklet size is enlarged from the default  (was: Too many tiems GC occurs in query if we increase the blocklet size)

> Big GC occurs frequently when Carbon's blocklet size is enlarged from the default
> ---------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-464
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-464
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: suo tong
>
> parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into column chunks in columnar format, and each column trunk consists of many pages, the page(default size 1 MB) can be independently uncompressed and processed.
> In case of current carbon since we use larger blocklet, it requires larger processing memory as well, as it decompresses all projected column chunks within a blocklet, which consumes big amount of memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)