carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihong MA (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-322) Integration with spark 2.x
Date Thu, 15 Dec 2016 18:51:58 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jihong MA updated CARBONDATA-322:
---------------------------------
    Description: 
Since spark 2.0 released. there are many nice features such as more efficient parser, vectorized
execution, adaptive execution. 
It is good to integrate with spark 2.x

current integration up to Spark v1.6 is tightly coupled with spark, we would like to cleanup
the interface with following design points in mind: 

1. decoupled with Spark, integration based on Spark's v2 datasource API
2. Enable vectorized carbon reader
3. Support saving DataFrame to Carbondata file through Carbondata's output format.
...


  was:
As spark 2.0 released. there are many nice features such as more efficient parser, vectorized
execution, adaptive execution. 
It is good to integrate with spark 2.x

Another side now in carbondata, spark integration is heavy coupling with spark code and the
code need clean, we should redesign the spark integration, it should satisfy flowing requirement:

1. decoupled with spark, integrate according to spark datasource API(V2)
2. This integration should support vectorized carbon reader
3. Supoort write to carbondata from dadatrame
...


     Issue Type: Improvement  (was: Bug)
        Summary: Integration with  spark 2.x   (was: integrate spark 2.x )

> Integration with  spark 2.x 
> ----------------------------
>
>                 Key: CARBONDATA-322
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-322
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: spark-integration
>    Affects Versions: 0.2.0-incubating
>            Reporter: Fei Wang
>            Assignee: Fei Wang
>             Fix For: 1.0.0-incubating
>
>
> Since spark 2.0 released. there are many nice features such as more efficient parser,
vectorized execution, adaptive execution. 
> It is good to integrate with spark 2.x
> current integration up to Spark v1.6 is tightly coupled with spark, we would like to
cleanup the interface with following design points in mind: 
> 1. decoupled with Spark, integration based on Spark's v2 datasource API
> 2. Enable vectorized carbon reader
> 3. Support saving DataFrame to Carbondata file through Carbondata's output format.
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message