carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface
Date Mon, 03 Oct 2016 16:31:20 GMT


ASF GitHub Bot commented on CARBONDATA-284:

GitHub user jackylk opened a pull request:

    [CARBONDATA-284][WIP] Abstracting index and segment interface

    This PR adds new User API and Dev API for carbon-hadoop module:
    ### User API
    - `CarbonColumnarInputFormat/OutputFormat`: it uses current `CarbonInputFormat` as internal
    - `CarbonRowInputFormat/OutputFormat`: it needs to be implemented
    - `CarbonOutputCommitter`: used for managing segment commit
    They are based on `CarbonInputFormatBase/OutputFormatBase`
    ### Dev API
    - Segment: an abstract class represents a single load of data,  used by CarbonInputFormatBase
to get all InputSplit by matching QueryModel, and used by CarbonOutputCommitter to prepare
for reading. Implementation examples are `IndexedSegment` and `StreamingSegment`.
    - SegmentManager: an interface to manage segments. Current implementation is `ZkSegmentManager`,
which need to be mapped to existing logic.
    - Index: an interface that can is used by `IndexedSegment` to filter InputSplit. Current
implementation is `InMemoryBTreeIndex` which load the index into driver's memory.
    `CarbonInputFormatUtil` is modified so that it can also be used by `CarbonColumnarInputFormat`.

You can merge this pull request into a Git repository by running:

    $ git pull index-interface

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #208
commit 398d2ec3e6706c615918a734a90f9dc4111067d8
Author: jackylk <>
Date:   2016-10-03T16:01:48Z

    add User API

commit 1d92a00403faeebc09bf595ba11b3e55d4c997f2
Author: jackylk <>
Date:   2016-10-03T16:02:04Z

    add Developer API

commit 1812a0a68b53ba5d48fc030e2a59329b0e827b05
Author: jackylk <>
Date:   2016-10-03T16:02:49Z

    refactory existing code

commit 430e7710b88725b587c1f3542d4d66ab02958cbc
Author: jackylk <>
Date:   2016-10-03T16:27:10Z

    change Index interface


> Abstracting Index and Segment interface
> ---------------------------------------
>                 Key: CARBONDATA-284
>                 URL:
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: hadoop-integration
>    Affects Versions: 0.1.0-incubating
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
> This issue is intended to abstract developer API and user API to achieve following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 

This message was sent by Atlassian JIRA

View raw message