Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6FEF8200C46 for ; Wed, 29 Mar 2017 18:01:08 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 6E4D8160B8A; Wed, 29 Mar 2017 16:01:08 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8E0BC160B7C for ; Wed, 29 Mar 2017 18:01:07 +0200 (CEST) Received: (qmail 67430 invoked by uid 500); 29 Mar 2017 16:01:06 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 67419 invoked by uid 99); 29 Mar 2017 16:01:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Mar 2017 16:01:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id BC911C0CE0; Wed, 29 Mar 2017 16:01:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.201 X-Spam-Level: **** X-Spam-Status: No, score=4.201 tagged_above=-999 required=6.31 tests=[DKIM_ADSP_CUSTOM_MED=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, NML_ADSP_CUSTOM_MED=1.2, RP_MATCHES_RCVD=-0.001] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id h0xg60dhTO4f; Wed, 29 Mar 2017 16:01:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 241E05F613; Wed, 29 Mar 2017 16:01:04 +0000 (UTC) Received: from reviews.apache.org (unknown [10.41.0.12]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C83ADE002F; Wed, 29 Mar 2017 16:01:03 +0000 (UTC) Received: from reviews-vm2.apache.org (localhost [IPv6:::1]) by reviews.apache.org (ASF Mail Server at reviews-vm2.apache.org) with ESMTP id BB013C400A1; Wed, 29 Mar 2017 16:01:03 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============5818245721932656253==" MIME-Version: 1.0 Subject: Re: Review Request 56290: HIVE-15795: Add Accumulo Index Table Support From: Josh Elser To: Josh Elser Cc: Mike Fagan , hive Date: Wed, 29 Mar 2017 16:01:03 -0000 Message-ID: <20170329160103.10620.49149@reviews-vm2.apache.org> X-ReviewBoard-URL: https://reviews.apache.org/ Auto-Submitted: auto-generated Sender: Josh Elser X-ReviewGroup: hive X-Auto-Response-Suppress: DR, RN, OOF, AutoReply X-ReviewRequest-URL: https://reviews.apache.org/r/56290/ X-Sender: Josh Elser References: <20170306162450.40992.72720@reviews-vm2.apache.org> In-Reply-To: <20170306162450.40992.72720@reviews-vm2.apache.org> X-ReviewBoard-Diff-For: accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexParameters.java X-ReviewBoard-Diff-For: accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloDefaultIndexScanner.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/AccumuloIndexParameters.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/package-info.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexDefinition.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexScannerException.java X-ReviewBoard-Diff-For: accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexLexicoder.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/package-info.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexScanner.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexLexicoder.java X-ReviewBoard-Diff-For: accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/IndexOutputConfigurator.java Reply-To: Josh Elser X-ReviewRequest-Repository: hive-git archived-at: Wed, 29 Mar 2017 16:01:08 -0000 --===============5818245721932656253== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/56290/#review170404 ----------------------------------------------------------- This is looking very nice, Mike! I'm going to pull down what you have now and test it locally. There are a couple of minor things (byte<->string encoding, text/byte[] performance, etc) to look at. I think the only major thing that stood out at me is the `NO_MATCH` `Range`. Your unit test coverage is quite nice, too! Good work. accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java Lines 135 (patched) I think this is still an issue here. 1. Semantically, a lack of an index record just means that there are no records in the data table. As such, an empty `List` would make sense to me. 2. I don't see use of `NO_MATCH` as the "special" match in the query execution path to short-circuit out and return no results (maybe I missed it?) Best as I can see, this is only used by tests. I think checking for an empty list returned by `getIndexRowRanges(..)` would be better. accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java Lines 165 (patched) A null check on `indexColumns` would be good (here or in the constructor). A quick glance looks like we pull it out a `Properties` instance which could be `null` (intentionally or not). accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexLexicoder.java Lines 65 (patched) Change the `getBytes()` calls to `getBytes(StandardCharsets.UTF_8)`, please. `java.nio.charset.StandardCharsets` was a nice addition in 1.7. Treating those bytes as utf-8 is reasonable enough. accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java Lines 90 (patched) Why `var6` and not `e`? :) accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java Lines 164 (patched) Use the Logger to print the stack trace instead, please. accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java Lines 223 (patched) Nit: the escaped single-quote is unnecessary. You just strike the '\' accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java Lines 225 (patched) The catch on `AccumuloException` and `AccumuloSecurityException` can just be dropped since this method can throw them. Don't need to catch+rethrow. accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java Lines 268 (patched) You can make this a bit better (avoiding String manipulation) by doing the following: ``` Text cf = new Text(); Text cq = new Text(); for (ColumnUpdate cu : ..) { cu.getColumnFamily(cf); cu.getColumnQualifier(cq); ... byte[] colType = indexDef.getColType(cf, cq); ``` For every column update in the mutation, this would save you a couple of String creations and byte-array copies. accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java Lines 332 (patched) I think you're missing a `throw new IOException(var7)` here. Users wouldn't know if data wasn't actually written unless they read the log (which they wouldn't do if the job succeeded). accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloRangeGenerator.java Lines 346 (patched) `getBytes(UTF_8)` here too, please. accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexParameters.java Lines 70 (patched) I'd just drop this `catch` and have the method `throw AccumuloIndexScannerException`. You'll get the same effect out of junit. - Josh Elser On March 6, 2017, 4:24 p.m., Mike Fagan wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/56290/ > ----------------------------------------------------------- > > (Updated March 6, 2017, 4:24 p.m.) > > > Review request for hive and Josh Elser. > > > Repository: hive-git > > > Description > ------- > > HIVE-15795: Add Accumulo Index Table Support > > > Diffs > ----- > > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexLexicoder.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexScanner.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexScannerException.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java cdbc7f2 > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexDefinition.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java 3ae5431 > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/IndexOutputConfigurator.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/package-info.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java a7ec7c5 > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloRangeGenerator.java 21392d1 > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/PrimitiveComparisonFilter.java 17d5529 > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/AccumuloIndexParameters.java PRE-CREATION > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/AccumuloSerDeParameters.java 09c5f24 > accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/package-info.java PRE-CREATION > accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloDefaultIndexScanner.java PRE-CREATION > accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexLexicoder.java PRE-CREATION > accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexParameters.java PRE-CREATION > accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java 0aaa782 > accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/predicate/TestAccumuloPredicateHandler.java 88e4530 > accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/predicate/TestAccumuloRangeGenerator.java 339da07 > accumulo-handler/src/test/queries/positive/accumulo_queries.q 279b661 > > > Diff: https://reviews.apache.org/r/56290/diff/2/ > > > Testing > ------- > > > Thanks, > > Mike Fagan > > --===============5818245721932656253==--