orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiyuan Dong <zhiyuan.d...@gmail.com>
Subject Re: access entire column in ORC files
Date Sat, 26 Jan 2019 03:47:39 GMT
in the   RowReader class, there is a function seekToRow(uint64_t
rowNumber), I am wondering there are code example showing how to use this
function to read columns in a row.

Many thanks

Best,

Zhiyuan

On Fri, Jan 25, 2019 at 8:10 PM Zhiyuan Dong <zhiyuan.dong@gmail.com> wrote:

> Let us add some context which may help explain my question better a little
> bit.
>
> suppose I have an orc files having many columns, e.g. 5000+ columns, the
> first column of each row stores some information I can use to decide if I
> need to extract a row or not.
>
> in the first pass, I read the first column from start to end to find out
> which are the subset of the rows that I need to extract, and allocate right
> amount of memory ready to store the rows identified, containing all the
> rest of columns.
>
> now, when I do a 2nd pass, for the rest of  5000+ columns, is there any
> ORC C++ API that I can use to only extract those row positions identified
> by the 1st pass ?
>
> what I am doing now is to extract the rest of columns, batch by batch,
>
> within each batch, all columns are populated to vectors its correct
> subtype, e.g. double, , and I pre-decide a set of read/skip steps within
> the rows of each batch, so that I can extract certain row
> positions.identified by the first pass, but not sure if this is an
> efficient way in given that there maybe  ORC C++. API there already built
> to handle situations like this.
>
> Many many thanks!
>
> Best,
>
> Zhiyuan
>
>
>
>
> On Fri, Jan 25, 2019 at 7:35 PM Zhiyuan Dong <zhiyuan.dong@gmail.com>
> wrote:
>
>> Thanks Xiening!!
>>
>> A follow-up  question :
>>
>> suppose I have an orc files having many columns,
>>
>> in the first pass, I read the first column from start to end to find out
>> which are the subset of the rows that I need to extract.
>>
>> now, when I do a 2nd pass, for the rest of columns, is there any
>> efficient way that I can only extract the row positions that I identified
>> in the first pass ?
>>
>> what I am doing now is to extract the rest of columns, batch by batch,
>> and only extract those rows identified by the first pass, but not sure if
>> this is an efficient way.
>>
>> Many thanks!!
>>
>> Best,
>>
>> Zhiyuan
>>
>
>
> --
> Zhiyuan Dong, Ph.D.
>


-- 
Zhiyuan Dong, Ph.D.

Mime
View raw message