drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanifi Gunes <hgu...@maprtech.com>
Subject Re: [jira] [Resolved] (DRILL-3551) CTAS from complex Json source with schema change is not written (and hence not read back ) correctly
Date Wed, 29 Jul 2015 15:38:21 GMT
Just an fyi I dropped a comment under the issue.

-H+

On Wed, Jul 29, 2015 at 5:40 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:

> Would you attach a sample input file manifesting the problem? My
> impression from outset was that a field selection bug that we recently
> fixed might have caused this.
>
>
> Thanks.
> -Hanifi
>
> On Wed, Jul 29, 2015 at 5:07 PM, Stefán Baxter <stefan@activitystream.com>
> wrote:
>
>> Hi,
>>
>> I think that this problem only showed it self for large datasets where
>> assumptions were being made after 1k records.
>>
>> Were you able to reproduce this with a smaller set?
>>
>> Regards,
>>  -Stefan
>>
>>
>> On Wed, Jul 29, 2015 at 2:01 PM, Hanifi Gunes (JIRA) <jira@apache.org>
>> wrote:
>>
>> >
>> >      [
>> >
>> https://issues.apache.org/jira/browse/DRILL-3551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> > ]
>> >
>> > Hanifi Gunes resolved DRILL-3551.
>> > ---------------------------------
>> >     Resolution: Fixed
>> >
>> > Tested on a small input file of 20 mixed records with and w/o the
>> > additional field. Looks like the good old field projection problem
>> surfaces
>> > here. So quite likely fixed by DRILL-3476. Please re-open attaching an
>> > input file if not fixed.
>> >
>> > > CTAS from complex Json source with schema change  is not written (and
>> > hence not read back ) correctly
>> > >
>> >
>> -----------------------------------------------------------------------------------------------------
>> > >
>> > >                 Key: DRILL-3551
>> > >                 URL: https://issues.apache.org/jira/browse/DRILL-3551
>>
>> > >             Project: Apache Drill
>> > >          Issue Type: Bug
>> > >          Components: Execution - Data Types
>> > >    Affects Versions: 1.1.0
>> > >            Reporter: Parth Chandra
>> > >            Assignee: Hanifi Gunes
>> > >            Priority: Critical
>> > >             Fix For: 1.2.0
>> > >
>> > >
>> > > The source data contains -
>> > > 20K rows with the following -
>> > >
>> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
>> > > 200 rows with the following -
>> > >
>> >
>> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
>> > > entries only"}}
>> > > Creating a table and reading it back returns incorrect data -
>> > > CREATE TABLE testparquet as select * from `test.json`;
>> > > SELECT * from testparquet;
>> > > Yields
>> > > | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
>> > > | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
>> > > | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
>> > > | yes  | {"other":"true","all":"false","sometimes":"yes"}  |
>> > > The "additional" field is missing in all records
>> > > Parquet metadata for the created file does not have the 'additional'
>> > field
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v6.3.4#6332)
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message