hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From j.prasant...@gmail.com
Subject Re: Review Request 45238: HIVE-9660 store end offset of compressed data for RG in RowIndex in ORC
Date Wed, 06 Apr 2016 23:27:57 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45238/#review126974
-----------------------------------------------------------




common/src/java/org/apache/hadoop/hive/conf/HiveConf.java (lines 1205 - 1206)
<https://reviews.apache.org/r/45238/#comment190098>

    IMO, this should not be made an option for the user. Should be enabled as long as index
is enabled. If for some reason, this is wrong, then disk range computation should fallback
to old code path using WORST_CASE_SLOP (2 x buffer size). Will be better to have config for
fallback path as opposed to tracking lengths.



orc/src/java/org/apache/orc/OrcConf.java (lines 100 - 102)
<https://reviews.apache.org/r/45238/#comment190102>

    same as my above comment.



orc/src/java/org/apache/orc/TypeDescription.java (line 384)
<https://reviews.apache.org/r/45238/#comment190103>

    Is this related?



orc/src/java/org/apache/orc/impl/OutStream.java (line 323)
<https://reviews.apache.org/r/45238/#comment190393>

    Is this related? doesn't seem to be used anywhere



orc/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java (line 191)
<https://reviews.apache.org/r/45238/#comment190830>

    By flush does this mean writing to OutStream? What if numLiterals straddles compression
buffers (first CB is flushed and closed and second CB contains some literals)? Is that still
considered to be flushed?



orc/src/java/org/apache/orc/impl/SerializationUtils.java (line 1138)
<https://reviews.apache.org/r/45238/#comment190831>

    Can you reuse SerializationUtils.readFully() method here? You have to make that method
public.



orc/src/java/org/apache/orc/impl/WriterImpl.java (line 592)
<https://reviews.apache.org/r/45238/#comment190835>

    I think it will be easier if we maintain a Map<String, LengthsBuilder> that maps
stream name and proto builder of lengths. The callback interface can be reportCompressionBuffer(String
name, int cbSize). When this callback is called, length information can be appended to the
lengths builder for the corresponding stream. When flushing the stripe, using the same stream
name we can check if the stream is suppressed or not. With that we can also avoid the index
adjustment. 
    
    I am just seeing if there are ways to simplify the state maintenance here. Thoughts?



orc/src/protobuf/orc_proto.proto (line 85)
<https://reviews.apache.org/r/45238/#comment190096>

    If this is a required field, it will break backward compatibility. Isn't it?



ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java (line 279)
<https://reviews.apache.org/r/45238/#comment190840>

    I don't see any changes that bypasses this when lengths are available. Is that going into
separate jira?


- Prasanth_J


On March 23, 2016, 7:08 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45238/
> -----------------------------------------------------------
> 
> (Updated March 23, 2016, 7:08 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c14df20 
>   llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapOptionsProcessor.java c292b37

>   llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
eb251a8 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
82187bd 
>   orc/src/java/org/apache/orc/OrcConf.java 6fcbb72 
>   orc/src/java/org/apache/orc/OrcFile.java 3945a5d 
>   orc/src/java/org/apache/orc/TypeDescription.java bd900ac 
>   orc/src/java/org/apache/orc/impl/BitFieldWriter.java aa5f886 
>   orc/src/java/org/apache/orc/impl/IntegerWriter.java 419054f 
>   orc/src/java/org/apache/orc/impl/OutStream.java 81662cc 
>   orc/src/java/org/apache/orc/impl/RunLengthByteWriter.java 09108b2 
>   orc/src/java/org/apache/orc/impl/RunLengthIntegerWriter.java 3e5f2e2 
>   orc/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java fab2801 
>   orc/src/java/org/apache/orc/impl/SerializationUtils.java c1162e4 
>   orc/src/java/org/apache/orc/impl/WriterImpl.java 6497ecf 
>   orc/src/protobuf/orc_proto.proto f4935b4 
>   orc/src/test/org/apache/orc/impl/TestBitFieldReader.java e4c6f6b 
>   orc/src/test/org/apache/orc/impl/TestBitPack.java f2d3d64 
>   orc/src/test/org/apache/orc/impl/TestInStream.java 9e65345 
>   orc/src/test/org/apache/orc/impl/TestIntegerCompressionReader.java 399f35e 
>   orc/src/test/org/apache/orc/impl/TestOutStream.java e9614d5 
>   orc/src/test/org/apache/orc/impl/TestRunLengthByteReader.java a14bef1 
>   orc/src/test/org/apache/orc/impl/TestRunLengthIntegerReader.java 28239ba 
>   ql/src/java/org/apache/hadoop/hive/llap/DebugUtils.java ea626d7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezJobMonitor.java 67f9da8 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecOrcFileDump.java d5d1370 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 9c2f88f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/JsonFileDump.java 00de545 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java 8a73948 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 96af96a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 29b51ec

>   ql/src/test/queries/clientpositive/orc_lengths.q PRE-CREATION 
>   ql/src/test/results/clientpositive/orc_lengths.q.out PRE-CREATION 
>   storage-api/src/java/org/apache/hadoop/hive/common/io/encoded/EncodedColumnBatch.java
ddba889 
> 
> Diff: https://reviews.apache.org/r/45238/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message