carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute
Date Wed, 02 Nov 2016 01:57:58 GMT

    [ https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15627357#comment-15627357
] 

ASF GitHub Bot commented on CARBONDATA-308:
-------------------------------------------

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058166
  
    --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
    @@ -22,28 +22,44 @@
     import java.io.DataOutput;
     import java.io.IOException;
     import java.io.Serializable;
    +import java.util.ArrayList;
    +import java.util.List;
    +
    +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
    +import org.apache.carbondata.core.carbon.datastore.block.Distributable;
    +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
    +import org.apache.carbondata.core.carbon.path.CarbonTablePath;
     
     import org.apache.hadoop.fs.Path;
     import org.apache.hadoop.io.Writable;
     import org.apache.hadoop.mapreduce.lib.input.FileSplit;
     
    +
     /**
      * Carbon input split to allow distributed read of CarbonInputFormat.
      */
    -public class CarbonInputSplit extends FileSplit implements Serializable, Writable {
    +public class CarbonInputSplit extends FileSplit implements Distributable, Serializable,
Writable {
     
       private static final long serialVersionUID = 3520344046772190207L;
       private String segmentId;
    -  /**
    +  public String taskId = "0";
    +
    +  /*
        * Number of BlockLets in a block
        */
       private int numberOfBlocklets = 0;
     
    -  public CarbonInputSplit() {
    -    super(null, 0, 0, new String[0]);
    +  public  CarbonInputSplit() {
       }
     
    -  public CarbonInputSplit(String segmentId, Path path, long start, long length,
    +  private void parserPath(Path path) {
    --- End diff --
    
    please use CarbonTablePath.DataFileUtil.getTaskNo


> Use CarbonInputFormat in CarbonScanRDD compute
> ----------------------------------------------
>
>                 Key: CARBONDATA-308
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-308
>             Project: CarbonData
>          Issue Type: Sub-task
>          Components: spark-integration
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is required, no
need to create full QueryModel object, so we can move creation of QueryModel from driver side
to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead of use QueryExecutor
directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message