carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihong MA (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-464) Frequent GC incurs when Carbon's blocklet size is enlarged from the default
Date Fri, 16 Dec 2016 02:31:58 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jihong MA updated CARBONDATA-464:
---------------------------------
    Description: 
other columnar file format fetch 1 million(a row group) at a time, its data is divided into
column chunks in columnar format, and each column trunk consists of many pages, the page(default
size 1 MB) can be independently uncompressed and processed.
In case of current carbon,  since we use larger blocklet, it requires larger processing memory
because it decompresses all projected column chunks within a blocklet all at once, which consumes
big amount of memory in total. Maybe we should consider to come up with an alternative approach
to balance I/O and processing, in order to reduce GC pressure.

  was:
parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into
column chunks in columnar format, and each column trunk consists of many pages, the page(default
size 1 MB) can be independently uncompressed and processed.
In case of current carbon since we use larger blocklet, it requires larger processing memory
as well, as it decompresses all projected column chunks within a blocklet, which consumes
big amount of memory. Maybe we should consider to come up with similar approach to balance
I/O and processing, but such a change requires carbon format level changes.

        Summary: Frequent GC incurs when Carbon's blocklet size is enlarged from the default
 (was: Big GC occurs frequently when Carbon's blocklet size is enlarged from the default)

> Frequent GC incurs when Carbon's blocklet size is enlarged from the default
> ---------------------------------------------------------------------------
>
>                 Key: CARBONDATA-464
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-464
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: suo tong
>
> other columnar file format fetch 1 million(a row group) at a time, its data is divided
into column chunks in columnar format, and each column trunk consists of many pages, the page(default
size 1 MB) can be independently uncompressed and processed.
> In case of current carbon,  since we use larger blocklet, it requires larger processing
memory because it decompresses all projected column chunks within a blocklet all at once,
which consumes big amount of memory in total. Maybe we should consider to come up with an
alternative approach to balance I/O and processing, in order to reduce GC pressure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message