Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@carbondata.incubator.apache.org
Date: Tue, 29 Nov 2016 07:23:58 +0000 (UTC)
From: "suo tong (JIRA)" <jira@apache.org>
To: issues@carbondata.incubator.apache.org
Message-ID: <JIRA.13023992.1480403663000.386596.1480404238394@Atlassian.JIRA>
In-Reply-To: <JIRA.13023992.1480403663000@Atlassian.JIRA>
References: <JIRA.13023992.1480403663000@Atlassian.JIRA> <JIRA.13023992.1480403663751@arcas>
Subject: [jira] [Updated] (CARBONDATA-464) Too many tiems GC occurs in query
 if we increase the blocklet size
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 29 Nov 2016 07:24:03 -0000


     [ https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

suo tong updated CARBONDATA-464:
--------------------------------
    Description: 
parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks, which can be independently uncompressed and processed.
In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes.

  was:
parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks, which can be independently uncompressed and processed.
In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing


> Too many tiems GC occurs in query if we increase the blocklet size
> ------------------------------------------------------------------
>
>                 Key: CARBONDATA-464
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-464
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: suo tong
>
> parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks, which can be independently uncompressed and processed.
> In case of current carbon if we use larger blocklet, it requires larger processing memory also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe we should consider to come up with similar approach to balance I/O and processing, but such a change requires carbon format level changes.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)