crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Apache Crunch Passing a Hash Map to DoFn
Date Wed, 23 Sep 2015 09:57:34 GMT
Hi Tahir,

If I understand correctly, then you're trying to load the contents of
a PTable into memory within a DoFn.

This can be done via the PCollection.asReadable method. A couple of
examples of this can be seen in the BloomFilterJoinStrategy.join and
MapsideJoinStrategy.joinInternal methods. The general idea is that you
pass a ReadableData instances into the constructor of you DoFn, and
then you can access the contents of the underlying PCollection by
iterating over the ReadableData within the initialize method of your
DoFn.

- Gabriel


On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <tahirh@gmail.com> wrote:
> Hi,
>
> I've a PTable which I store as an Avro file. The PTable file is later to be
> used in another DoFn after it is converted into a HashMap.
>
> PTable<String, MyClass> myClassData = table.parallelDo(new
> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
> Target target=To.avroFile("/user/xyz/output/");
> myClassData.write(target,Target.WriteMode.OVERWRITE);
>
> Can you please tell me how this file maybe read in another DoFn?
>
> Best,
>
> Tahir
>
>

Mime
View raw message