crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Custom ReadableSource implementations that reuse a PType?
Date Thu, 14 Jan 2016 23:29:34 GMT
Hey Evan,

So I must be missing some context here-- it looks to me like
MRPipeline.getMaterializeSourceTarget first checks to see if you passed in
an input collection type and then looks at its Source to see if it
implements ReadableSource and uses that if it's available before defaulting
to the ptype's default file source later on in the file via
createIntermediateOutput. Is the PCollection you're trying to materialize
not an input collection? e.g., is it an input collection that has had
parallelDo called it on one or more times?

J

On Thu, Jan 14, 2016 at 9:08 AM, Evan McClain <aeroevan@gmail.com> wrote:

> Hi list,
>
> I am trying to implement a custom ReadableSource by extending
> FileSourceImpl, but it doesn't look like my source is actually being
> used since I am reusing AvroTypes (basically just trying to pass
> through some avro getMeta() values).
>
> It looks like cache() -> materialize() -> ptype.getDefaultFileSource()
> is being used to perform the actual read (even though pipeline.read()
> is being passed my custom Source).
>
> Is there any way to do this without also implementing a custom PType?
>
> Thanks
> --
> Evan McClain
> https://keybase.io/aeroevan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message