carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacky Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-307) Support executor side scan using CarbonInputFormat
Date Thu, 13 Oct 2016 08:14:20 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacky Li updated CARBONDATA-307:
--------------------------------
    Description: 
Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor
for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader
=> QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

Because of this, there are unnecessary duplicate code, they need to be unified.
The target approach should be:
sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD => QueryExecutor


  was:
Currently, there are two read path in carbon-spark module: 
1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor
for scan.

2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader
=> QueryExecutor
In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan

Because of this, there are unnecessary duplicate code, they need to be unified.



> Support executor side scan using CarbonInputFormat
> --------------------------------------------------
>
>                 Key: CARBONDATA-307
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-307
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: spark-integration
>    Affects Versions: 0.1.0-incubating
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> Currently, there are two read path in carbon-spark module: 
> 1. CarbonContext => CarbonDatasourceRelation => CarbonScanRDD => QueryExecutor
> In this case, CarbonScanRDD uses CarbonInputFormat to get the split, and use QueryExecutor
for scan.
> 2. SqlContext => CarbonDatasourceHadoopRelation => CarbonHadoopFSRDD => CarbonRecordReader
=> QueryExecutor
> In this case, CarbonHadoopFSRDD uses CarbonInputFormat to do both get split and scan
> Because of this, there are unnecessary duplicate code, they need to be unified.
> The target approach should be:
> sqlContext/carbonContext => CarbonDatasourceHadoopRelation => CarbonScanRDD =>
QueryExecutor



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message