hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Shelukhin <ser...@hortonworks.com>
Subject Re: Review Request 45238: HIVE-9660 store end offset of compressed data for RG in RowIndex in ORC
Date Thu, 07 Apr 2016 04:20:50 GMT


> On April 6, 2016, 11:27 p.m., Prasanth_J wrote:
> > common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, lines 1205-1206
> > <https://reviews.apache.org/r/45238/diff/1/?file=1312350#file1312350line1205>
> >
> >     IMO, this should not be made an option for the user. Should be enabled as long
as index is enabled. If for some reason, this is wrong, then disk range computation should
fallback to old code path using WORST_CASE_SLOP (2 x buffer size). Will be better to have
config for fallback path as opposed to tracking lengths.

this is needed for the write path - what if the write path is problematic because of the length
tracking?


> On April 6, 2016, 11:27 p.m., Prasanth_J wrote:
> > orc/src/java/org/apache/orc/TypeDescription.java, line 384
> > <https://reviews.apache.org/r/45238/diff/1/?file=1312356#file1312356line384>
> >
> >     Is this related?

yes


> On April 6, 2016, 11:27 p.m., Prasanth_J wrote:
> > orc/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java, line 192
> > <https://reviews.apache.org/r/45238/diff/1/?file=1312362#file1312362line192>
> >
> >     By flush does this mean writing to OutStream? What if numLiterals straddles
compression buffers (first CB is flushed and closed and second CB contains some literals)?
Is that still considered to be flushed?

yes, just writing to outstream. It doesn't matter for this class what happens there for the
most part (see big comment in WriterImpl); the flush count is always updated after writing
out to OutStream, so in any callback from CBs created during the write, we'll still see the
old flush count. Only when that 2nd CB is written, we will get a callback and find the new
flush count.


> On April 6, 2016, 11:27 p.m., Prasanth_J wrote:
> > orc/src/java/org/apache/orc/impl/WriterImpl.java, line 595
> > <https://reviews.apache.org/r/45238/diff/1/?file=1312364#file1312364line595>
> >
> >     I think it will be easier if we maintain a Map<String, LengthsBuilder>
that maps stream name and proto builder of lengths. The callback interface can be reportCompressionBuffer(String
name, int cbSize). When this callback is called, length information can be appended to the
lengths builder for the corresponding stream. When flushing the stripe, using the same stream
name we can check if the stream is suppressed or not. With that we can also avoid the index
adjustment. 
> >     
> >     I am just seeing if there are ways to simplify the state maintenance here. Thoughts?

the map by enum might make sense... do you think this is acceptable in terms of extra overhead?


> On April 6, 2016, 11:27 p.m., Prasanth_J wrote:
> > orc/src/protobuf/orc_proto.proto, line 85
> > <https://reviews.apache.org/r/45238/diff/1/?file=1312365#file1312365line85>
> >
> >     If this is a required field, it will break backward compatibility. Isn't it?

it's repeated, not required :)


> On April 6, 2016, 11:27 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java, line 280
> > <https://reviews.apache.org/r/45238/diff/1/?file=1312378#file1312378line280>
> >
> >     I don't see any changes that bypasses this when lengths are available. Is that
going into separate jira?

see estimateRgEndOffset calls. It calls the fn right now to log the message, but uses lengths
if available. I can remove it when the patch is done... or only do it if info is enabled


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45238/#review126974
-----------------------------------------------------------


On March 23, 2016, 7:08 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45238/
> -----------------------------------------------------------
> 
> (Updated March 23, 2016, 7:08 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java c14df20 
>   llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapOptionsProcessor.java c292b37

>   llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/OrcEncodedDataReader.java
eb251a8 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/metadata/OrcStripeMetadata.java
82187bd 
>   orc/src/java/org/apache/orc/OrcConf.java 6fcbb72 
>   orc/src/java/org/apache/orc/OrcFile.java 3945a5d 
>   orc/src/java/org/apache/orc/TypeDescription.java bd900ac 
>   orc/src/java/org/apache/orc/impl/BitFieldWriter.java aa5f886 
>   orc/src/java/org/apache/orc/impl/IntegerWriter.java 419054f 
>   orc/src/java/org/apache/orc/impl/OutStream.java 81662cc 
>   orc/src/java/org/apache/orc/impl/RunLengthByteWriter.java 09108b2 
>   orc/src/java/org/apache/orc/impl/RunLengthIntegerWriter.java 3e5f2e2 
>   orc/src/java/org/apache/orc/impl/RunLengthIntegerWriterV2.java fab2801 
>   orc/src/java/org/apache/orc/impl/SerializationUtils.java c1162e4 
>   orc/src/java/org/apache/orc/impl/WriterImpl.java 6497ecf 
>   orc/src/protobuf/orc_proto.proto f4935b4 
>   orc/src/test/org/apache/orc/impl/TestBitFieldReader.java e4c6f6b 
>   orc/src/test/org/apache/orc/impl/TestBitPack.java f2d3d64 
>   orc/src/test/org/apache/orc/impl/TestInStream.java 9e65345 
>   orc/src/test/org/apache/orc/impl/TestIntegerCompressionReader.java 399f35e 
>   orc/src/test/org/apache/orc/impl/TestOutStream.java e9614d5 
>   orc/src/test/org/apache/orc/impl/TestRunLengthByteReader.java a14bef1 
>   orc/src/test/org/apache/orc/impl/TestRunLengthIntegerReader.java 28239ba 
>   ql/src/java/org/apache/hadoop/hive/llap/DebugUtils.java ea626d7 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/tez/TezJobMonitor.java 67f9da8 
>   ql/src/java/org/apache/hadoop/hive/ql/hooks/PostExecOrcFileDump.java d5d1370 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/FileDump.java 9c2f88f 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/JsonFileDump.java 00de545 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderUtils.java 8a73948 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReader.java 96af96a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 29b51ec

>   ql/src/test/queries/clientpositive/orc_lengths.q PRE-CREATION 
>   ql/src/test/results/clientpositive/orc_lengths.q.out PRE-CREATION 
>   storage-api/src/java/org/apache/hadoop/hive/common/io/encoded/EncodedColumnBatch.java
ddba889 
> 
> Diff: https://reviews.apache.org/r/45238/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message