spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scottcarey <...@git.apache.org>
Subject [GitHub] spark issue #21070: [SPARK-23972][BUILD][SQL] Update Parquet to 1.10.0.
Date Wed, 18 Apr 2018 21:43:38 GMT
Github user scottcarey commented on the issue:

    https://github.com/apache/spark/pull/21070
  
    @rdblue 
    The problem with zstd is that it is only in Hadoop 3.0, and dropping _that_ jar in breaks
things as it is a major release.  Extracting out only the ZStandardCodec from that and recompiling
to a 2.x release does not work either, because it depends on some low level hadoop native
library management to load the native library (it does not appear to use  https://github.com/luben/zstd-jni).
    
    The alternative is to write a custom ZStandardCodec implementation that uses luben:zstd-jni
    
    Furthermore, if you add a `o.a.h.io.codecs.ZStandardCodec` class to a jar on the client
side, it is still not found -- my guess is there is some classloader isolation between client
code and spark itself and spark itself is what needs to find the class.  So one has to have
it installed inside of the spark distribution.
    
    I may take you up on fixing the compression codec dependency mess in a couple months.
 The hardest part will be lining up the configuration options with what users already expect
-- the raw codecs aren't that hard to do.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message