orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <omal...@apache.org>
Subject Proposed metadata for ORC files
Date Thu, 24 Sep 2015 20:40:48 GMT
   While thinking about making resource management for vectorized ORC
readers, one of the difficult points is figuring out how big the vectors
for the nested types need to be.  I'd like to propose that we add a
statistic for each column that records the maximum number of instances we
need for each vector row group of 1024 rows.

  Having that number would let you set the vector row batch for the complex
types as you are starting each stripe as well as being able to predict how
much memory the reader will need.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message