drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Continued Avro Frustration
Date Fri, 01 Apr 2016 13:51:49 GMT
Stefán -

I don't think you are being unreasonable.  I think the topics you've
brought up are valid, and have seriously tempered my willingness to use
avro more in my organization.  Since we haven't committed to it yet, I
think that makes it less of a priority, thus I am less vocal, but I would
imagine the Drill community should take note of this. A "supported" (
https://drill.apache.org/docs/querying-a-file-system-introduction/) file
format certainly should be given more care than this.  It sets a bad
precedent and may scare off users.

While for you, the option of removing Avro from supported file formats
would be a kick in the pants, I think the Drill project should consider
whether the ramifications of stating something is supported but having very
poor support.  This is a huge issue for a project, it doesn't elicit trust,
it frustrates users like yourself, and for users who are exploring the
project it may turn them away.  I think a rational discussion on this
topic, with an outcome being decided upon (not left open) is very important
to the Drill project as a whole, and I applaud your tenacity in bringing up
these issues.  Is it possible for you to join the weekly hangouts?  It
would be good to talk things out there.


John


On Fri, Apr 1, 2016 at 7:43 AM, Stefán Baxter <stefan@activitystream.com>
wrote:

> Hi,
>
> Is it at all possible that we are the only company trying to use Avro with
> Drill to some serious extent?
>
> We continue to coma across all sorts of embarrassing shortcomings like the
> one we are dealing with now where a schema change exception is thrown even
> when working with a single Avro file (that has the same schema).
>
> Can a non project member call for a discussion on this topic and the level
> of support that is offered for Avro in Drill?
>
> My discussion topics would be:
>
>    - Strange schema validation that ... :
>    ... currently fails on single file
>    ... prevents dirX variables to work
>    ... would require Drill to scan all Avro files to establish schema (even
>    when pruning would be used)
>    ... would ALWAY fail for old queries if the an old Avro file, containing
>    the original fields, was removed and could not be scanned
>    ... does not rhyme with the "eliminate ETL" and "Evolving Schema" goals
>    of Drill
>
>    - Simple union types do not work to declare nullable fields
>
>    - Drill can not read Parquet that is created by parquet-mr-avro
>
>    - What is the intention for Avro in Drill
>    - Should we select to use some other format to buffer/badge data before
>    creating a Parquet file for it?
>
>    - The culture here regarding talking about boring/hard topics like this
>    - Where serious complaints/issues are met with silence
>    - I know full well that my frustration shines through here and that it
>    not helping but this Drill+Avro mess is really getting too much for us
> to
>    handle
>
> Look forward do discuss this here or during the next hangout.
>
> Regards,
>  -Stefán (or ... mr. old & frustrated)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message