Unfortunately we don't have an API to return a row of data. You have to extract each column from the batches.

For seekToRow(uint64_t rowNumber), you can  jump to the row specified by rowNumber and then use rowReader->next() to get the batch. It is pretty straightforward.

You can actually create two rowReaders. The 1st rowReader only include the 1st column you need via rowReaderOptions and try to gather the columns you want. Then you create the 2nd rowReader which only include those columns you want  via rowReaderOptions. Does that make sense?

Let me know if you have any questions.

Gang

On Fri, Jan 25, 2019 at 7:48 PM Zhiyuan Dong <zhiyuan.dong@gmail.com> wrote:
in the   RowReader class, there is a function seekToRow(uint64_t rowNumber), I am wondering there are code example showing how to use this function to read columns in a row.

Many thanks

Best,

Zhiyuan

On Fri, Jan 25, 2019 at 8:10 PM Zhiyuan Dong <zhiyuan.dong@gmail.com> wrote:
Let us add some context which may help explain my question better a little bit.

suppose I have an orc files having many columns, e.g. 5000+ columns, the first column of each row stores some information I can use to decide if I need to extract a row or not. 

in the first pass, I read the first column from start to end to find out which are the subset of the rows that I need to extract, and allocate right amount of memory ready to store the rows identified, containing all the rest of columns.

now, when I do a 2nd pass, for the rest of  5000+ columns, is there any ORC C++ API that I can use to only extract those row positions identified by the 1st pass ?

what I am doing now is to extract the rest of columns, batch by batch, 

within each batch, all columns are populated to vectors its correct subtype, e.g. double, , and I pre-decide a set of read/skip steps within the rows of each batch, so that I can extract certain row positions.identified by the first pass, but not sure if this is an efficient way in given that there maybe  ORC C++. API there already built to handle situations like this.

Many many thanks!

Best,

Zhiyuan




On Fri, Jan 25, 2019 at 7:35 PM Zhiyuan Dong <zhiyuan.dong@gmail.com> wrote:
Thanks Xiening!!

A follow-up  question : 

suppose I have an orc files having many columns,  

in the first pass, I read the first column from start to end to find out which are the subset of the rows that I need to extract.

now, when I do a 2nd pass, for the rest of columns, is there any efficient way that I can only extract the row positions that I identified in the first pass ?

what I am doing now is to extract the rest of columns, batch by batch, and only extract those rows identified by the first pass, but not sure if this is an efficient way.

Many thanks!!

Best,

Zhiyuan


--
Zhiyuan Dong, Ph.D.


--
Zhiyuan Dong, Ph.D.