Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 20DD2200BA3 for ; Thu, 20 Oct 2016 10:52:46 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 1F6CE160ADB; Thu, 20 Oct 2016 08:52:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3E1BA160AE0 for ; Thu, 20 Oct 2016 10:52:45 +0200 (CEST) Received: (qmail 35325 invoked by uid 500); 20 Oct 2016 08:52:44 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 35312 invoked by uid 99); 20 Oct 2016 08:52:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2016 08:52:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0388EC1240 for ; Thu, 20 Oct 2016 08:52:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id WturZfeuCbMZ for ; Thu, 20 Oct 2016 08:52:43 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with SMTP id 9674D5F613 for ; Thu, 20 Oct 2016 08:52:42 +0000 (UTC) Received: (qmail 24949 invoked by uid 99); 20 Oct 2016 08:49:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Oct 2016 08:49:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 304BF2C0D55 for ; Thu, 20 Oct 2016 08:49:59 +0000 (UTC) Date: Thu, 20 Oct 2016 08:49:59 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CARBONDATA-298) 3. Add InputProcessorStep which should iterate recordreader and parse the data as per the data type. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 20 Oct 2016 08:52:46 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591245#comment-15591245 ] ASF GitHub Bot commented on CARBONDATA-298: ------------------------------------------- Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/240#discussion_r84236268 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java --- @@ -0,0 +1,171 @@ +package org.apache.carbondata.processing.newflow.steps.input; + +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.util.CarbonProperties; +import org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep; +import org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration; +import org.apache.carbondata.processing.newflow.DataField; +import org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants; +import org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException; +import org.apache.carbondata.processing.newflow.parser.CarbonParserFactory; +import org.apache.carbondata.processing.newflow.parser.GenericParser; +import org.apache.carbondata.processing.newflow.row.CarbonRow; +import org.apache.carbondata.processing.newflow.row.CarbonRowBatch; + +/** + * It reads data from record reader and sends data to next step. + */ +public class InputProcessorStepImpl extends AbstractDataLoadProcessorStep { + + private static final LogService LOGGER = + LogServiceFactory.getLogService(InputProcessorStepImpl.class.getName()); + + private GenericParser[] genericParsers; + + private List> inputIterators; + + public InputProcessorStepImpl(CarbonDataLoadConfiguration configuration, + AbstractDataLoadProcessorStep child, List> inputIterators) { + super(configuration, child); + this.inputIterators = inputIterators; + } + + @Override public DataField[] getOutput() { + DataField[] fields = configuration.getDataFields(); + String[] header = configuration.getHeader(); + DataField[] output = new DataField[fields.length]; + int k = 0; + for (int i = 0; i < header.length; i++) { + for (int j = 0; j < fields.length; j++) { + if (header[j].equalsIgnoreCase(fields[j].getColumn().getColName())) { + output[k++] = fields[j]; + break; + } + } + } + return output; + } + + @Override public void initialize() throws CarbonDataLoadingException { + DataField[] output = getOutput(); + genericParsers = new GenericParser[output.length]; + for (int i = 0; i < genericParsers.length; i++) { + genericParsers[i] = CarbonParserFactory.createParser(output[i].getColumn(), + (String[]) configuration + .getDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS)); + } + } + + private int getNumberOfCores() { + int numberOfCores; + try { + numberOfCores = Integer.parseInt(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.NUM_CORES_LOADING, + CarbonCommonConstants.NUM_CORES_DEFAULT_VAL)); + } catch (NumberFormatException exc) { + numberOfCores = Integer.parseInt(CarbonCommonConstants.NUM_CORES_DEFAULT_VAL); + } + return numberOfCores; + } + + private int getBatchSize() { + int batchSize; + try { + batchSize = Integer.parseInt(configuration + .getDataLoadProperty(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE, + DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT).toString()); + } catch (NumberFormatException exc) { + batchSize = Integer.parseInt(DataLoadProcessorConstants.DATA_LOAD_BATCH_SIZE_DEFAULT); + } + return batchSize; + } + + @Override public Iterator[] execute() { + int batchSize = getBatchSize(); + List>[] readerIterators = partitionInputReaderIterators(); + Iterator[] outIterators = new Iterator[readerIterators.length]; + for (int i = 0; i < outIterators.length; i++) { + outIterators[i] = new InputProcessorIterator(readerIterators[i], genericParsers, batchSize); + } + return outIterators; + } + + private List>[] partitionInputReaderIterators() { + int numberOfCores = getNumberOfCores(); + if (inputIterators.size() < numberOfCores) { + numberOfCores = inputIterators.size(); + } + List>[] iterators = new List[numberOfCores]; + for (int i = 0; i < numberOfCores; i++) { + iterators[i] = new ArrayList<>(); + } + + for (int i = 0; i < inputIterators.size(); i++) { + iterators[i % numberOfCores].add(inputIterators.get(i)); + + } + return iterators; + } + + @Override protected CarbonRow processRow(CarbonRow row) { + return null; + } + + private static class InputProcessorIterator extends CarbonIterator { + + private List> inputIterators; + + private GenericParser[] genericParsers; + + private Iterator currentIterator; + + private int counter; + + private int batchSize; + + public InputProcessorIterator(List> inputIterators, + GenericParser[] genericParsers, int batchSize) { --- End diff -- how do we ensure the passing `genericParsers` is comply to the `DataField[]` returned by `getOutput` of this class. (schema should match, and `genericParsers` array should be consistent with that schema) > 3. Add InputProcessorStep which should iterate recordreader and parse the data as per the data type. > ---------------------------------------------------------------------------------------------------- > > Key: CARBONDATA-298 > URL: https://issues.apache.org/jira/browse/CARBONDATA-298 > Project: CarbonData > Issue Type: Sub-task > Reporter: Ravindra Pesala > Fix For: 0.2.0-incubating > > > Add InputProcessorStep which should iterate recordreader/RecordBufferedWriter and parse the data as per the data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)