crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <>
Subject Re: Apache Crunch Passing a Hash Map to DoFn
Date Wed, 23 Sep 2015 09:57:34 GMT
Hi Tahir,

If I understand correctly, then you're trying to load the contents of
a PTable into memory within a DoFn.

This can be done via the PCollection.asReadable method. A couple of
examples of this can be seen in the BloomFilterJoinStrategy.join and
MapsideJoinStrategy.joinInternal methods. The general idea is that you
pass a ReadableData instances into the constructor of you DoFn, and
then you can access the contents of the underlying PCollection by
iterating over the ReadableData within the initialize method of your

- Gabriel

On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <> wrote:
> Hi,
> I've a PTable which I store as an Avro file. The PTable file is later to be
> used in another DoFn after it is converted into a HashMap.
> PTable<String, MyClass> myClassData = table.parallelDo(new
> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
> Target target=To.avroFile("/user/xyz/output/");
> myClassData.write(target,Target.WriteMode.OVERWRITE);
> Can you please tell me how this file maybe read in another DoFn?
> Best,
> Tahir

View raw message