Let us add some context which may help explain my question better a little bit.
suppose I have an orc files having many columns, e.g. 5000+ columns, the first column of each row stores some information I can use to decide if I need to extract a row or not.
in the first pass, I read the first column from start to end to find out which are the subset of the rows that I need to extract, and allocate right amount of memory ready to store the rows identified, containing all the rest of columns.
now, when I do a 2nd pass, for the rest of 5000+ columns, is there any ORC C++ API that I can use to only extract those row positions identified by the 1st pass ?
what I am doing now is to extract the rest of columns, batch by batch,
within each batch, all columns are populated to vectors its correct subtype, e.g. double, , and I pre-decide a set of read/skip steps within the rows of each batch, so that I can extract certain row positions.identified by the first pass, but not sure if this is an efficient way in given that there maybe ORC C++. API there already built to handle situations like this.
Many many thanks!