crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan McClain <aeroe...@gmail.com>
Subject Re: Custom ReadableSource implementations that reuse a PType?
Date Fri, 15 Jan 2016 22:36:17 GMT
The PCollection that is being cached is the result of a parallelDo (so
it is a DoCollection). The parent InputCollection has the source I
defined. Since the first read is triggered by that .cache() call on the
DoCollection, the file source given in the InputCollection is never
actually used.

Stepping through a debugger shows that even after using
pipeline.read(...).cache(), my ReadableSource.read() doesn't get called
even though I see my source in outputTargetsToMaterialize.value.source.
I only see AvroFileSourceTarget performing the reads so I'm probably
missing something (I'm new to crunch).

Any pointers would be appreciated.

Evan

On Thu, 2016-01-14 at 15:29 -0800, Josh Wills wrote:
> Hey Evan,
> 
> So I must be missing some context here-- it looks to me like
> MRPipeline.getMaterializeSourceTarget first checks to see if you
> passed in
> an input collection type and then looks at its Source to see if it
> implements ReadableSource and uses that if it's available before
> defaulting
> to the ptype's default file source later on in the file via
> createIntermediateOutput. Is the PCollection you're trying to
> materialize
> not an input collection? e.g., is it an input collection that has had
> parallelDo called it on one or more times?
> 
> J
> 
> On Thu, Jan 14, 2016 at 9:08 AM, Evan McClain <aeroevan@gmail.com>
> wrote:
> 
> > Hi list,
> > 
> > I am trying to implement a custom ReadableSource by extending
> > FileSourceImpl, but it doesn't look like my source is actually
> > being
> > used since I am reusing AvroTypes (basically just trying to pass
> > through some avro getMeta() values).
> > 
> > It looks like cache() -> materialize() ->
> > ptype.getDefaultFileSource()
> > is being used to perform the actual read (even though
> > pipeline.read()
> > is being passed my custom Source).
> > 
> > Is there any way to do this without also implementing a custom
> > PType?
> > 
> > Thanks
> > --
> > Evan McClain
> > https://keybase.io/aeroevan
> > 
-- 
Evan McClain
https://keybase.io/aeroevan

Mime
View raw message