orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elliot West <tea...@gmail.com>
Subject Search argument scope
Date Wed, 15 Jul 2015 10:43:41 GMT
Hello, I have a question regarding the design of search arguments.

As I understand it, search arguments are used in conjunction with ORC file
indexes to identify files that need not be read. I presume that in practice
the search argument is derived from some higher-level filter (e.g. a
condition in a Hive statement) that is also applied by the processing
framework (typically Hive) once records are read.

Is there any reason why search arguments could/should not also be used to
filter out non-matching records in the OrcRecordReader in addition to
filtering out stripes? This would remove irrelevant records earlier in the
data processing pipeline, and possibly remove the need for the downstream

Thanks - Elliot.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message