drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Order of records read in a parquet file
Date Fri, 06 Nov 2015 23:45:54 GMT
Is this a large or private parquet file? Can you share it to allow me to
debug the read path for it?

On Fri, Nov 6, 2015 at 3:37 PM, Jason Altekruse <altekrusejason@gmail.com>
wrote:

> The changes to parquet were not supposed to be functional at all. We had
> been maintaining our fork of parquet-mr to have a ByteBuffer based read and
> write path to reduce heap memory usage. The work done was just getting
> these changes merged back into parquet-mr and making corresponding changes
> in Drill to accommodate any interface modifications introduced since we
> last rebased (there were mostly just package renames). There were a lot of
> comments on the PR, and a decent amount of refactoring that was done to
> consolidate and otherwise clean up the code, but there shouldn't have been
> any changes to the behavior of the reader or writer.
>
> Are you getting all of the same data out if you read the whole file, just
> in a different order?
>
> On Fri, Nov 6, 2015 at 3:31 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
>> parquet-meta command suggests that there is only one row group
>>
>> On Fri, Nov 6, 2015 at 3:23 PM, Jacques Nadeau <jacques@dremio.com>
>> wrote:
>>
>> > How many row groups?
>> >
>> > --
>> > Jacques Nadeau
>> > CTO and Co-Founder, Dremio
>> >
>> > On Fri, Nov 6, 2015 at 3:14 PM, rahul challapalli <
>> > challapallirahul@gmail.com> wrote:
>> >
>> > > Drillers,
>> > >
>> > > With the new parquet library update, can someone throw some light on
>> the
>> > > order in which the records are read from a single parquet file?
>> > >
>> > > With the older library, when I run the below query on a single parquet
>> > > file, I used to get a set of records. Now after the parquet library
>> > update,
>> > > I am seeing a different set of records. Just wanted to understand what
>> > > specifically has changed.
>> > >
>> > > select * from `file.parquet` limit 5;
>> > >
>> > > - Rahul
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message