orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Christle <dchris...@linkedin.com>
Subject Status of ORC-363 -- Add ZStandard to ORC Java writer/reader
Date Sat, 09 Feb 2019 00:06:53 GMT

I am interested in the status of pull request ORC-363 (https://github.com/apache/orc/pull/306),
which adds the ZStandard compression codec to the Java reader/writer. I am very keen on experimenting
with this codec for large scale data processing and driving adoption of it to my colleagues,
but I noticed that it seems to have stalled since the beginning of November waiting for review.
As you know, ZStandard is a newer compression algorithm that offers essentially better compression
than zlib at substantially faster speeds. It was recently enabled in the C++ writer/reader
in ORC-395 (https://github.com/apache/orc/pull/301), but I don’t think this will work for
using ZStandard within ORC in Apache Spark (my primary data processing framework).

I do think this addition to ORC is a good one to shepherd through the review process, as I
think it will be useful for anyone doing the kind of large scale data processing that ORC
is designed to enable – Facebook has already implemented ZStandard in ORC, and recently
reported double-digit improvements in both compression and speed (https://code.fb.com/core-data/zstandard/)
in their data warehousing applications.

Kind regards,
David Christle

View raw message