spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From staslos <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-5657][Examples][PySpark] Add PySpark Av...
Date Tue, 19 May 2015 19:32:28 GMT
Github user staslos commented on the pull request:

    https://github.com/apache/spark/pull/4434#issuecomment-103645571
  
    We've been using both Spark Core and Spark SQL for over 6 months by now. For sure, we're
not an experts here, but we found that Spark Core better suits our data pipeline (as a Pig
replacement), while Spark SQL is more analytical tool. When it's about data moving and transformation,
we prefer Spark Core to Spark SQL's 'magic' because Spark Core is more stable, gives us more
control over the process and more confidence.
    
    Also, last time I checked on Spark SQL, I couldn't achieve proper Avro schema evolution
which is absolutely critical for our data pipeline dealing with different version of the same
data. An ability to provide reader and writer schema is proceless. I couldn't find the way
to do this in Spark SQL. Our data scientists have to use projection in Spark SQL to be able
to read across different versions of data. Lucky them, they don't need to use all the fields
and pass them down the pipeline. 
    
    Also, correct me if I'm wrong, Spark SQL is not production ready yet. Our latest upgrade
from Spark 1.2.0 to 1.3.0 proved we were right sticking with Spark Core, at least for now,
while our data scientists were going mad since their Spark SQL scripts stopped working with
S3.
    
    Anyway, thank you guys, for doing the great job. Feel free to toss this pull request,
I was just thinking back in February it could be useful for other people facing the same problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message