carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "suo tong (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-464) Too many tiems GC occurs in query if we increase the blocklet size
Date Tue, 29 Nov 2016 07:31:58 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

suo tong updated CARBONDATA-464:
--------------------------------
    Description: 
parquet might fetch from i/o 1 million(a row group) at one time, its data is divided into
column chunks in columnar format, and each column trunk consists of many pages, the page(default
size 1 MB) can be independently uncompressed and processed.
In case of current carbon if we use larger blocklet, it requires larger processing memory
also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe
we should consider to come up with similar approach to balance I/O and processing, but such
a change requires carbon format level changes.

  was:
parquet might fetch from i/o 1 million at one time, but its data is divided into column chunks,
and each column trunk consist of many pages, the page(default size 1 MB) can be independently
uncompressed and processed.
In case of current carbon if we use larger blocklet, it requires larger processing memory
also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe
we should consider to come up with similar approach to balance I/O and processing, but such
a change requires carbon format level changes.


> Too many tiems GC occurs in query if we increase the blocklet size
> ------------------------------------------------------------------
>
>                 Key: CARBONDATA-464
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-464
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: suo tong
>
> parquet might fetch from i/o 1 million(a row group) at one time, its data is divided
into column chunks in columnar format, and each column trunk consists of many pages, the page(default
size 1 MB) can be independently uncompressed and processed.
> In case of current carbon if we use larger blocklet, it requires larger processing memory
also, as it decompresses complete blocklet required columns and keeps it in memory. Maybe
we should consider to come up with similar approach to balance I/O and processing, but such
a change requires carbon format level changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message