hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Review Request 56290: HIVE-15795: Add Accumulo Index Table Support
Date Wed, 29 Mar 2017 16:01:03 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56290/#review170404
-----------------------------------------------------------



This is looking very nice, Mike! I'm going to pull down what you have now and test it locally.

There are a couple of minor things (byte<->string encoding, text/byte[] performance,
etc) to look at. I think the only major thing that stood out at me is the `NO_MATCH` `Range`.

Your unit test coverage is quite nice, too! Good work.


accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java
Lines 135 (patched)
<https://reviews.apache.org/r/56290/#comment243230>

    I think this is still an issue here.
    
    1. Semantically, a lack of an index record just means that there are no records in the
data table. As such, an empty `List<Range>` would make sense to me.
    2. I don't see use of `NO_MATCH` as the "special" match in the query execution path to
short-circuit out and return no results (maybe I missed it?)
    
    Best as I can see, this is only used by tests. I think checking for an empty list returned
by `getIndexRowRanges(..)` would be better.



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java
Lines 165 (patched)
<https://reviews.apache.org/r/56290/#comment243231>

    A null check on `indexColumns` would be good (here or in the constructor). A quick glance
looks like we pull it out a `Properties` instance which could be `null` (intentionally or
not).



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexLexicoder.java
Lines 65 (patched)
<https://reviews.apache.org/r/56290/#comment243232>

    Change the `getBytes()` calls to `getBytes(StandardCharsets.UTF_8)`, please.
    
    `java.nio.charset.StandardCharsets` was a nice addition in 1.7. Treating those bytes as
utf-8 is reasonable enough.



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
Lines 90 (patched)
<https://reviews.apache.org/r/56290/#comment243235>

    Why `var6` and not `e`? :)



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
Lines 164 (patched)
<https://reviews.apache.org/r/56290/#comment243236>

    Use the Logger to print the stack trace instead, please.



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
Lines 223 (patched)
<https://reviews.apache.org/r/56290/#comment243237>

    Nit: the escaped single-quote is unnecessary. You just strike the '\'



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
Lines 225 (patched)
<https://reviews.apache.org/r/56290/#comment243238>

    The catch on `AccumuloException` and `AccumuloSecurityException` can just be dropped since
this method can throw them. Don't need to catch+rethrow.



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
Lines 268 (patched)
<https://reviews.apache.org/r/56290/#comment243243>

    You can make this a bit better (avoiding String manipulation) by doing the following:
    
    ```
    Text cf = new Text();
    Text cq = new Text();
    for (ColumnUpdate cu : ..) {
      cu.getColumnFamily(cf);
      cu.getColumnQualifier(cq);
      
      ...
      
      byte[] colType = indexDef.getColType(cf, cq);
    ```
    
    For every column update in the mutation, this would save you a couple of String creations
and byte-array copies.



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
Lines 332 (patched)
<https://reviews.apache.org/r/56290/#comment243244>

    I think you're missing a `throw new IOException(var7)` here. Users wouldn't know if data
wasn't actually written unless they read the log (which they wouldn't do if the job succeeded).



accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloRangeGenerator.java
Lines 346 (patched)
<https://reviews.apache.org/r/56290/#comment243246>

    `getBytes(UTF_8)` here too, please.



accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexParameters.java
Lines 70 (patched)
<https://reviews.apache.org/r/56290/#comment243248>

    I'd just drop this `catch` and have the method `throw AccumuloIndexScannerException`.
You'll get the same effect out of junit.


- Josh Elser


On March 6, 2017, 4:24 p.m., Mike Fagan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56290/
> -----------------------------------------------------------
> 
> (Updated March 6, 2017, 4:24 p.m.)
> 
> 
> Review request for hive and Josh Elser.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15795: Add Accumulo Index Table Support
> 
> 
> Diffs
> -----
> 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloDefaultIndexScanner.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexLexicoder.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexScanner.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloIndexScannerException.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/AccumuloStorageHandler.java
cdbc7f2 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexDefinition.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/AccumuloIndexedOutputFormat.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/HiveAccumuloTableOutputFormat.java
3ae5431 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/IndexOutputConfigurator.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/mr/package-info.java PRE-CREATION

>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloPredicateHandler.java
a7ec7c5 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/AccumuloRangeGenerator.java
21392d1 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/predicate/PrimitiveComparisonFilter.java
17d5529 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/AccumuloIndexParameters.java
PRE-CREATION 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/AccumuloSerDeParameters.java
09c5f24 
>   accumulo-handler/src/java/org/apache/hadoop/hive/accumulo/serde/package-info.java PRE-CREATION

>   accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloDefaultIndexScanner.java
PRE-CREATION 
>   accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexLexicoder.java
PRE-CREATION 
>   accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloIndexParameters.java
PRE-CREATION 
>   accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/TestAccumuloStorageHandler.java
0aaa782 
>   accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/predicate/TestAccumuloPredicateHandler.java
88e4530 
>   accumulo-handler/src/test/org/apache/hadoop/hive/accumulo/predicate/TestAccumuloRangeGenerator.java
339da07 
>   accumulo-handler/src/test/queries/positive/accumulo_queries.q 279b661 
> 
> 
> Diff: https://reviews.apache.org/r/56290/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Mike Fagan
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message