carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liang Chen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-844) Avoid to get useless splits
Date Fri, 07 Jul 2017 00:06:00 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Liang Chen updated CARBONDATA-844:
----------------------------------
    Fix Version/s:     (was: 1.1.1)
                   NONE

> Avoid to get useless splits
> ---------------------------
>
>                 Key: CARBONDATA-844
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-844
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.1.0
>            Reporter: Yadong Qi
>            Assignee: Yadong Qi
>             Fix For: NONE
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> In current implements of CarbonInputFormat.getDataBlocksOfSegment, 
> 1. Get all of the carbondata splits in segments directory.
> 2. Read the carbonindex and construct the B-tree.
> 3. Apply filter and get matching splits.
> I think we get some useless splits and the operator of getSplits is expensive. So we'd
better to do the getSplits after filter:
> 1. List the segment directory, and filter the path of carbonindex.
> 2. Read the carbonindex and construct the B-tree.
> 3. Apply filter and get matching blocks.
> 4. Get carbondata splits from filtered blocks.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message