carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yadong Qi (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CARBONDATA-844) Avoid to get useless splits
Date Sat, 01 Apr 2017 09:00:48 GMT
Yadong Qi created CARBONDATA-844:
------------------------------------

             Summary: Avoid to get useless splits
                 Key: CARBONDATA-844
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-844
             Project: CarbonData
          Issue Type: Improvement
          Components: core
    Affects Versions: 1.1.0-incubating
            Reporter: Yadong Qi
            Assignee: Yadong Qi


In current implements of CarbonInputFormat.getDataBlocksOfSegment, 
1. Get all of the carbondata splits in segments directory.
2. Read the carbonindex and construct the B-tree.
3. Apply filter and get matching splits.

I think we get some useless splits and the operator of getSplits is expensive. So we'd better
to do the getSplits after filter:
1. List the segment directory, and filter the path of carbonindex.
2. Read the carbonindex and construct the B-tree.
3. Apply filter and get matching blocks.
4. Get carbondata splits from filtered blocks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message