crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tahir Hameed <tah...@gmail.com>
Subject Re: Apache Crunch Passing a Hash Map to DoFn
Date Wed, 23 Sep 2015 11:58:10 GMT
I solved the problem by setting materialize to false while getting the
readable : myClassData.asReadable(false)  . Though I am still not sure why
this happens.


Tahir

Tahir

On Wed, Sep 23, 2015 at 1:36 PM, Tahir Hameed <tahirh@gmail.com> wrote:

> Hi Gabriel,
>
> Thanks for the answer. After implementing what you suggested, I am getting
> the following error:
>
> 2015-09-23 13:23:10,859 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : org.apache.crunch.CrunchRuntimeException: Can't find local cache file for '/tmp/crunch-253557813/p1'
> 	at org.apache.crunch.io.impl.ReadableDataImpl.getCacheFilePath(ReadableDataImpl.java:81)
> 	at org.apache.crunch.io.impl.ReadableDataImpl.access$000(ReadableDataImpl.java:42)
> 	at org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:93)
> 	at org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:90)
> 	at com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451)
> 	at java.util.AbstractList$Itr.next(AbstractList.java:358)
> 	at com.google.common.collect.Iterables$3.next(Iterables.java:508)
> 	at com.google.common.collect.Iterables$3.next(Iterables.java:501)
> 	at com.google.common.collect.Iterators$5.hasNext(Iterators.java:544)
> 	at com.bol.step.enrichmentdashboard.ProductsDoFN.initialize(ProductsDoFN.java:35)
> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71)
> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73)
> 	at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:48)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>
>
>
>  Can you suggest where I can be going wrong?
>
>
> Tahir
>
>
> On Wed, Sep 23, 2015 at 11:57 AM, Gabriel Reid <gabriel.reid@gmail.com>
> wrote:
>
>> Hi Tahir,
>>
>> If I understand correctly, then you're trying to load the contents of
>> a PTable into memory within a DoFn.
>>
>> This can be done via the PCollection.asReadable method. A couple of
>> examples of this can be seen in the BloomFilterJoinStrategy.join and
>> MapsideJoinStrategy.joinInternal methods. The general idea is that you
>> pass a ReadableData instances into the constructor of you DoFn, and
>> then you can access the contents of the underlying PCollection by
>> iterating over the ReadableData within the initialize method of your
>> DoFn.
>>
>> - Gabriel
>>
>>
>> On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <tahirh@gmail.com> wrote:
>> > Hi,
>> >
>> > I've a PTable which I store as an Avro file. The PTable file is later
>> to be
>> > used in another DoFn after it is converted into a HashMap.
>> >
>> > PTable<String, MyClass> myClassData = table.parallelDo(new
>> >
>> MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
>> > Target target=To.avroFile("/user/xyz/output/");
>> > myClassData.write(target,Target.WriteMode.OVERWRITE);
>> >
>> > Can you please tell me how this file maybe read in another DoFn?
>> >
>> > Best,
>> >
>> > Tahir
>> >
>> >
>>
>
>

Mime
View raw message