crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: Use of PCollection#materialize on Spark Pipeline
Date Fri, 06 Nov 2015 23:13:11 GMT
I think there was a bug w/the caching that Micah noticed:

Maybe related?

On Fri, Nov 6, 2015 at 12:14 PM, Jeff Quinn <> wrote:

> Hello,
> Are there any known issues with using PCollection#materialize with
> SparkPipeline? I am trying to use it in my pipeline and I am seeing
> interesting errors occur sometimes when the materialization is attempted,
> such as:
> java.lang.IllegalArgumentException: Unknown codec:
> ^@^@^@^C^@^@^@^E^C^H?^A??^AP^@^@^A?^@
> SeqFileReaderFactory: Could not read seqfile at path:
> hdfs://ip-10-0-17-226.ec2.internal:8020/tmp/crunch-300241792/p5/part-r-00001
> Invalid size: -2062707543 for file metadata object
> This is with Crunch 0.13.0 / Spark 1.5.0. Anyone have any ideas?
> Thanks!
> Jeff
> *DISCLAIMER:* The contents of this email, including any attachments, may
> contain information that is confidential, proprietary in nature, protected
> health information (PHI), or otherwise protected by law from disclosure,
> and is solely for the use of the intended recipient(s). If you are not the
> intended recipient, you are hereby notified that any use, disclosure or
> copying of this email, including any attachments, is unauthorized and
> strictly prohibited. If you have received this email in error, please
> notify the sender of this email. Please delete this and all copies of this
> email from your system. Any opinions either expressed or implied in this
> email and all attachments, are those of its author only, and do not
> necessarily reflect those of Nuna Health, Inc.

View raw message