spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-24924) Add mapping for built-in Avro data source
Date Fri, 03 Aug 2018 15:56:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-24924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16568393#comment-16568393
] 

Thomas Graves commented on SPARK-24924:
---------------------------------------

| It wouldn't be very different for 2.4.0. It could be different but I guess it should be
incremental improvement without behaviour changes.

I don't buy this agrument, the code has been restructured a lot and you could have introduced
bugs, behavior changes, etc.  If the user has been using the databrick spark-avro version
for other releases and it was working fine and now we magically map it to a different version
and they break, they are going to complain and say, I didn't change anything why did this
break. 

Users could have also made their own modified version of the databricks spark-avro package
(which we actually have to support primitive types) and thus the implementation is not the
same and yet you are assuming it is.  Just a note the fact we use different version isn't
my issue, I'm happy to make that work, I'm worried about other users who didn't happen to
see this jira.   I also realize these are 3rd party packages but I think we are making
the assumption here based on this being a databricks package, which in my opinion we shouldn't. 
 What if this was companyX package which we didn't know about, what would/should be the expected
behavior? 

How many users complained about the csv thing?  Could we just improve the error message to
more simply state, "Multiple sources found, perhaps you are including an external package
that also supports avro. Spark started internally supporting as of release X.Y, please remove
the external package or rewrite to use different function"

> Add mapping for built-in Avro data source
> -----------------------------------------
>
>                 Key: SPARK-24924
>                 URL: https://issues.apache.org/jira/browse/SPARK-24924
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Dongjoon Hyun
>            Assignee: Dongjoon Hyun
>            Priority: Minor
>             Fix For: 2.4.0
>
>
> This issue aims to the followings.
>  # Like `com.databricks.spark.csv` mapping, we had better map `com.databricks.spark.avro`
to built-in Avro data source.
>  # Remove incorrect error message, `Please find an Avro package at ...`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message