crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <>
Subject Re: emitting the same object with different internals
Date Mon, 10 Jun 2013 03:14:57 GMT
A single input record will flow through all of the DoFns that are contained
within the map/reduce stage of the computation before another record is
processed, so mutating an object and then passing it along is usually a
safe operation in Crunch. It will be serialized to the output from that
stage before the next output is processed.

That said, I generally prefer immutable objects, or some sort of builder
pattern that allows you to easily convert an immutable object into a
mutable form and then create another immutable object after you make
changes to the mutable builder. I always find myself doing something like
caching a collection of objects at some point in my pipeline, and when I
do, the use of mutable objects ends up biting me. The PType class has a
method, getDetachedValue, which allows you to safely copy an object into a
different type, and we make liberal use of it in the internal libraries
when we need to do some caching and can't be sure of whether or not the
input object is immutable.


On Sun, Jun 9, 2013 at 5:06 PM, Sandy Ryza <> wrote:

> Will the following code work in Crunch?
> ---
> private SomeMutableObject smo;
> public void process(Integer input, Emitter<SomeMutableObject> emitter) {
>   smo.mutate(input);
>   emitter.emit(smo);
> }
> ---
> i.e. will the object be written/copied when emit is called is called so
> that changes to it in a later call of the process function won't change
> what was emitted in an earlier one?
> thanks for any help!
> Sandy

Director of Data Science
Cloudera <>
Twitter: @josh_wills <>

View raw message