Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7E31610E5F for ; Thu, 13 Feb 2014 17:14:44 +0000 (UTC) Received: (qmail 2231 invoked by uid 500); 13 Feb 2014 17:14:42 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 2173 invoked by uid 500); 13 Feb 2014 17:14:42 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 2161 invoked by uid 99); 13 Feb 2014 17:14:42 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Feb 2014 17:14:42 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id E17C61D4223; Thu, 13 Feb 2014 17:14:40 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============1097961620067925402==" MIME-Version: 1.0 Subject: Re: Review Request 17899: HIVE-5998 Add vectorized reader for Parquet files From: "Brock Noland" To: "Jitendra Pandey" , "Eric Hanson" , "Brock Noland" Cc: "Remus Rusanu" , "hive" Date: Thu, 13 Feb 2014 17:14:40 -0000 Message-ID: <20140213171440.22558.22918@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Brock Noland" X-ReviewGroup: hive X-ReviewRequest-URL: https://reviews.apache.org/r/17899/ X-Sender: "Brock Noland" References: <20140213083808.22558.88197@reviews.apache.org> In-Reply-To: <20140213083808.22558.88197@reviews.apache.org> Reply-To: "Brock Noland" X-ReviewRequest-Repository: hive-git --===============1097961620067925402== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17899/#review34382 ----------------------------------------------------------- Nice work!! I noted parts of the patch have 4 spaces and other parts have two. ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java trailing ws ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java trailing ws - Brock Noland On Feb. 13, 2014, 8:38 a.m., Remus Rusanu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17899/ > ----------------------------------------------------------- > > (Updated Feb. 13, 2014, 8:38 a.m.) > > > Review request for hive, Brock Noland, Eric Hanson, and Jitendra Pandey. > > > Bugs: HIVE-5998 > https://issues.apache.org/jira/browse/HIVE-5998 > > > Repository: hive-git > > > Description > ------- > > Implementation is straight forward and very simple, but offers all benefits of vectorization possible with a 'shallow' vectorized reader (ie. one that doe not got into parquet-mr project changes). the only complication arrised because of discrepancies between the object inspector seen by the inputformat and the actual output provided by the Parquet readers (eg. OI declares 'byte' primitives but the Parquet reader outputs IntWritable). I had to create a just-in-time VectorColumnAssigner colelciton base don whatever writers the Parquet record reader provides. It is assumed the reader does not change it's output during the iteration. > > > Diffs > ----- > > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java d1a75df > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java 0b504de > ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatchCtx.java d409d44 > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/MapredParquetInputFormat.java d3412df > ql/src/java/org/apache/hadoop/hive/ql/io/parquet/VectorizedParquetInputFormat.java PRE-CREATION > ql/src/test/queries/clientpositive/vectorized_parquet.q PRE-CREATION > ql/src/test/results/clientpositive/vectorized_parquet.q.out PRE-CREATION > > Diff: https://reviews.apache.org/r/17899/diff/ > > > Testing > ------- > > Manually tested. New query .q added. > > > Thanks, > > Remus Rusanu > > --===============1097961620067925402==--