crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-346) Don't deep-copy immutable Writable PTypes
Date Sun, 16 Feb 2014 16:51:19 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902752#comment-13902752
] 

Chao Shi commented on CRUNCH-346:
---------------------------------

Hi [~gabriel.reid], I haven't started yet. Please feel free to attach your patch.

bq.  but apparently that's only the case with Avro.
I don't quite understand. Could you explain more why writable PTypes cannot?
As far as I understand, we do deep-copy because
1) we don't want DoFn modifying its input affects others (e.g. deep-copy in IntermediateEmitter)
2) the content of Writable input to mapper or reducer is changed over each run, so it becomes
invalid out-of-scope of the mapper/reducer function

In crunch world, we use PTypes built from writables. If the PType is immutable (e.g. java
String), it does not reference to the writable type MR pass to us. Please correct me if I
was wrong.

> Don't deep-copy immutable Writable PTypes
> -----------------------------------------
>
>                 Key: CRUNCH-346
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-346
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Chao Shi
>
> I found getDetachedValue() appears quite often when jstack on one of my pipeline. A piece
of stacktrace is shown below. In the pipeline, most of types we used are immutable (e.g. java
primitives, strings, protobuf). I think we can avoid deep-copy overhead here.
> "main" prio=10 tid=0x00007f0de801d800 nid=0x7ef runnable [0x00007f0dee66c000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.hadoop.io.BytesWritable.<init>(BytesWritable.java:52)
> 	at org.apache.hadoop.io.BytesWritable.<init>(BytesWritable.java:46)
> 	at sun.reflect.GeneratedConstructorAccessor9.newInstance(Unknown Source)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at java.lang.Class.newInstance0(Class.java:355)
> 	at java.lang.Class.newInstance(Class.java:308)
> 	at org.apache.crunch.types.writable.WritableDeepCopier.deepCopy(WritableDeepCopier.java:63)
> 	at org.apache.crunch.types.writable.WritableDeepCopier.deepCopy(WritableDeepCopier.java:36)
> 	at org.apache.crunch.types.writable.WritableType.getDetachedValue(WritableType.java:125)
> 	at org.apache.crunch.impl.mr.emit.IntermediateEmitter.emit(IntermediateEmitter.java:54)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message