crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel Reid <gabriel.r...@gmail.com>
Subject Re: Apache Crunch Passing a Hash Map to DoFn
Date Wed, 23 Sep 2015 12:33:00 GMT
Hi Tahir,

Good to hear you got it going.

It's difficult to say what the underlying issue would have been in
your original version (with materialize set to to true) without seeing
the code, but my guess is that there is an issue with reading a
materialized collection that is taken directly from a Source without
any DoFns between the original input and where it's being converted to
a ReadableData.

- Gabriel


On Wed, Sep 23, 2015 at 1:58 PM, Tahir Hameed <tahirh@gmail.com> wrote:
> I solved the problem by setting materialize to false while getting the
> readable : myClassData.asReadable(false)  . Though I am still not sure why
> this happens.
>
>
> Tahir
>
> Tahir
>
> On Wed, Sep 23, 2015 at 1:36 PM, Tahir Hameed <tahirh@gmail.com> wrote:
>>
>> Hi Gabriel,
>>
>> Thanks for the answer. After implementing what you suggested, I am getting
>> the following error:
>>
>> 2015-09-23 13:23:10,859 WARN [main] org.apache.hadoop.mapred.YarnChild:
>> Exception running child : org.apache.crunch.CrunchRuntimeException: Can't
>> find local cache file for '/tmp/crunch-253557813/p1'
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl.getCacheFilePath(ReadableDataImpl.java:81)
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl.access$000(ReadableDataImpl.java:42)
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:93)
>> 	at
>> org.apache.crunch.io.impl.ReadableDataImpl$1.apply(ReadableDataImpl.java:90)
>> 	at
>> com.google.common.collect.Lists$TransformingRandomAccessList.get(Lists.java:451)
>> 	at java.util.AbstractList$Itr.next(AbstractList.java:358)
>> 	at com.google.common.collect.Iterables$3.next(Iterables.java:508)
>> 	at com.google.common.collect.Iterables$3.next(Iterables.java:501)
>> 	at com.google.common.collect.Iterators$5.hasNext(Iterators.java:544)
>> 	at
>> com.bol.step.enrichmentdashboard.ProductsDoFN.initialize(ProductsDoFN.java:35)
>> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:71)
>> 	at org.apache.crunch.impl.mr.run.RTNode.initialize(RTNode.java:73)
>> 	at org.apache.crunch.impl.mr.run.CrunchMapper.setup(CrunchMapper.java:48)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
>>
>>
>>
>>  Can you suggest where I can be going wrong?
>>
>>
>> Tahir
>>
>>
>> On Wed, Sep 23, 2015 at 11:57 AM, Gabriel Reid <gabriel.reid@gmail.com>
>> wrote:
>>>
>>> Hi Tahir,
>>>
>>> If I understand correctly, then you're trying to load the contents of
>>> a PTable into memory within a DoFn.
>>>
>>> This can be done via the PCollection.asReadable method. A couple of
>>> examples of this can be seen in the BloomFilterJoinStrategy.join and
>>> MapsideJoinStrategy.joinInternal methods. The general idea is that you
>>> pass a ReadableData instances into the constructor of you DoFn, and
>>> then you can access the contents of the underlying PCollection by
>>> iterating over the ReadableData within the initialize method of your
>>> DoFn.
>>>
>>> - Gabriel
>>>
>>>
>>> On Wed, Sep 23, 2015 at 9:56 AM, Tahir Hameed <tahirh@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I've a PTable which I store as an Avro file. The PTable file is later
>>> > to be
>>> > used in another DoFn after it is converted into a HashMap.
>>> >
>>> > PTable<String, MyClass> myClassData = table.parallelDo(new
>>> >
>>> > MyClassDoFN(),Avros.tableOf(Avros.strings(),Avros.reflects(MyClass.class)));
>>> > Target target=To.avroFile("/user/xyz/output/");
>>> > myClassData.write(target,Target.WriteMode.OVERWRITE);
>>> >
>>> > Can you please tell me how this file maybe read in another DoFn?
>>> >
>>> > Best,
>>> >
>>> > Tahir
>>> >
>>> >
>>
>>
>

Mime
View raw message