drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefán Baxter <ste...@activitystream.com>
Subject Fwd: Avro deserialization bug - 1.3-SNAPSHOT
Date Thu, 12 Nov 2015 06:35:47 GMT
Hi,

Decided to send this to dev* as well.

Can someone please assist me with this problem of Drill distorting string
values that are read from Avro files.

Regards,
 -Stefan

---------- Forwarded message ----------
From: Stefán Baxter <stefan@activitystream.com>
Date: Wed, Nov 11, 2015 at 10:14 PM
Subject: Re: Avro deserialization bug - 1.3-SNAPSHOT
To: user <user@drill.apache.org>


Hi,

Can someone please verify that this is in fact a bug so I can rule out our
own mistakes?

We have recently moved all our logging to Avro to compensate for schema
differences in JSON that were causing various problems and our latest
release is now impeded with this.
Alternatively can someone please point me in the right direction if I was
to try to fix this myself.

Regards,
  -Stefán

On Tue, Nov 10, 2015 at 2:41 PM, Stefán Baxter <stefan@activitystream.com>
wrote:

> Thank you Kamesh.
>
> I have created https://issues.apache.org/jira/browse/DRILL-4056 with the
> description.
> I will send you a confidential test file to your private email.
>
> Regards,
>  -Stefan
>
> On Tue, Nov 10, 2015 at 2:30 PM, Kamesh <kamesh.hadoop@gmail.com> wrote:
>
>> Hi Stefán,
>>  Could you please raise a Jira with sample schema and sample input to
>> reproduce it. I will look into this.
>>
>> On Tue, Nov 10, 2015 at 7:55 PM, Stefán Baxter <stefan@activitystream.com
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > I have an Avro file that support the following data/schema:
>> >
>> > {"field":"some", "classification":{"variant":"Gæst"}}
>> >
>> > When I select 10 rows from this file I get:
>> >
>> > +---------------------+
>> > |       EXPR$0        |
>> > +---------------------+
>> > | Gæst                |
>> > | Voksen              |
>> > | Voksen              |
>> > | Invitation KIF KBH  |
>> > | Invitation KIF KBH  |
>> > | Ordinarie pris KBH  |
>> > | Ordinarie pris KBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > | Biljetter 200 krBH  |
>> > +---------------------+
>> >
>> > The bug is that the field values are incorrectly de-serialized and the
>> > value from the previous row is retained if the subsequent row is
>> shorter.
>> >
>> > The sql query:
>> >
>> > "select s.classification.variant variant from dfs.<some> as s limit 10;"
>> >
>> >
>> > That way the  "Ordinarie pris" becomes "Ordinarie pris KBH" because the
>> > previous row had the value "Invitation KIF KBH".
>> >
>> > Regards,
>> >   -Stefán
>> >
>>
>>
>>
>> --
>> Kamesh.
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message