Thanks Xiening!!

A follow-up  question : 

suppose I have an orc files having many columns,  

in the first pass, I read the first column from start to end to find out which are the subset of the rows that I need to extract.

now, when I do a 2nd pass, for the rest of columns, is there any efficient way that I can only extract the row positions that I identified in the first pass ?

what I am doing now is to extract the rest of columns, batch by batch, and only extract those rows identified by the first pass, but not sure if this is an efficient way.

Many thanks!!

Best,

Zhiyuan