orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiyuan Dong <zhiyuan.d...@gmail.com>
Subject Re: access entire column in ORC files
Date Sat, 26 Jan 2019 02:10:51 GMT
Let us add some context which may help explain my question better a little
bit.

suppose I have an orc files having many columns, e.g. 5000+ columns, the
first column of each row stores some information I can use to decide if I
need to extract a row or not.

in the first pass, I read the first column from start to end to find out
which are the subset of the rows that I need to extract, and allocate right
amount of memory ready to store the rows identified, containing all the
rest of columns.

now, when I do a 2nd pass, for the rest of  5000+ columns, is there any ORC
C++ API that I can use to only extract those row positions identified by
the 1st pass ?

what I am doing now is to extract the rest of columns, batch by batch,

within each batch, all columns are populated to vectors its correct
subtype, e.g. double, , and I pre-decide a set of read/skip steps within
the rows of each batch, so that I can extract certain row
positions.identified by the first pass, but not sure if this is an
efficient way in given that there maybe  ORC C++. API there already built
to handle situations like this.

Many many thanks!

Best,

Zhiyuan




On Fri, Jan 25, 2019 at 7:35 PM Zhiyuan Dong <zhiyuan.dong@gmail.com> wrote:

> Thanks Xiening!!
>
> A follow-up  question :
>
> suppose I have an orc files having many columns,
>
> in the first pass, I read the first column from start to end to find out
> which are the subset of the rows that I need to extract.
>
> now, when I do a 2nd pass, for the rest of columns, is there any efficient
> way that I can only extract the row positions that I identified in the
> first pass ?
>
> what I am doing now is to extract the rest of columns, batch by batch, and
> only extract those rows identified by the first pass, but not sure if this
> is an efficient way.
>
> Many thanks!!
>
> Best,
>
> Zhiyuan
>


-- 
Zhiyuan Dong, Ph.D.

Mime
View raw message