Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0D1E2200BAC for ; Wed, 26 Oct 2016 18:01:26 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 0BA5F160B02; Wed, 26 Oct 2016 16:01:26 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 53317160AEE for ; Wed, 26 Oct 2016 18:01:25 +0200 (CEST) Received: (qmail 45347 invoked by uid 500); 26 Oct 2016 16:01:24 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 44905 invoked by uid 99); 26 Oct 2016 16:01:24 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2016 16:01:24 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id C7A1CE0FC4; Wed, 26 Oct 2016 16:01:23 +0000 (UTC) From: ppadma To: dev@drill.apache.org Reply-To: dev@drill.apache.org References: In-Reply-To: Subject: [GitHub] drill pull request #611: Drill-4800: Improve parquet reader performance Content-Type: text/plain Message-Id: <20161026160123.C7A1CE0FC4@git1-us-west.apache.org> Date: Wed, 26 Oct 2016 16:01:23 +0000 (UTC) archived-at: Wed, 26 Oct 2016 16:01:26 -0000 Github user ppadma commented on a diff in the pull request: https://github.com/apache/drill/pull/611#discussion_r84951720 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/columnreaders/VarLenBinaryReader.java --- @@ -41,43 +51,149 @@ public VarLenBinaryReader(ParquetRecordReader parentReader, List firstColumnStatus) throws IOException { long recordsReadInCurrentPass = 0; - int lengthVarFieldsInCurrentRecord; - long totalVariableLengthData = 0; - boolean exitLengthDeterminingLoop = false; + // write the first 0 offset for (VarLengthColumn columnReader : columns) { columnReader.reset(); } + recordsReadInCurrentPass = determineSizesSerial(recordsToReadInThisPass); + if(useAsyncTasks){ + readRecordsParallel(recordsReadInCurrentPass); + }else{ + readRecordsSerial(recordsReadInCurrentPass); + } + return recordsReadInCurrentPass; + } + + + private long determineSizesSerial(long recordsToReadInThisPass) throws IOException { + int lengthVarFieldsInCurrentRecord = 0; + boolean exitLengthDeterminingLoop = false; + long totalVariableLengthData = 0; + long recordsReadInCurrentPass = 0; do { - lengthVarFieldsInCurrentRecord = 0; for (VarLengthColumn columnReader : columns) { - if ( !exitLengthDeterminingLoop ) { - exitLengthDeterminingLoop = columnReader.determineSize(recordsReadInCurrentPass, lengthVarFieldsInCurrentRecord); + if (!exitLengthDeterminingLoop) { + exitLengthDeterminingLoop = + columnReader.determineSize(recordsReadInCurrentPass, lengthVarFieldsInCurrentRecord); } else { break; } } // check that the next record will fit in the batch - if (exitLengthDeterminingLoop || (recordsReadInCurrentPass + 1) * parentReader.getBitWidthAllFixedFields() + totalVariableLengthData - + lengthVarFieldsInCurrentRecord > parentReader.getBatchSize()) { + if (exitLengthDeterminingLoop || + (recordsReadInCurrentPass + 1) * parentReader.getBitWidthAllFixedFields() + + totalVariableLengthData + lengthVarFieldsInCurrentRecord > parentReader.getBatchSize()) { break; } - for (VarLengthColumn columnReader : columns ) { + for (VarLengthColumn columnReader : columns) { columnReader.updateReadyToReadPosition(); columnReader.currDefLevel = -1; } recordsReadInCurrentPass++; totalVariableLengthData += lengthVarFieldsInCurrentRecord; } while (recordsReadInCurrentPass < recordsToReadInThisPass); + return recordsReadInCurrentPass; + } + + + public long determineSizesParallel(long recordsToReadInThisPass ) throws IOException { --- End diff -- Seems like this function is not used anywhere. Do you still want to keep it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---