accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-2915) Avoid copying all Mutations when using a TabletServerBatchWriter
Date Mon, 16 Jun 2014 22:44:02 GMT


Josh Elser commented on ACCUMULO-2915:

This is big (in a good way), but it also goes against the current implementation. I don't
think I've ever stumbled across use of a Mutation that wasn't "create", "add updates", "submit"
and then "throwaway", despite that being the default case that the BatchWriter expects. Despite
it being awkward to me, we're going to have to keep that default action to ensure we don't
break anyone. They'll have to opt-in to a {{zero-copy}} variant.

I like the {{ImmutableMutation}} idea, but as long as it serializes in the same way that {{Mutation}}
currently does, it shouldn't result in any changes server-side. I was talking with [~bills]
about this, making {{ImmutableMutation}} {{Writable}} does introduce interface confusion because
it shouldn't be used as a normal {{Writable}} object is.

This makes me wonder if we should consider using Kryo for serialization of "mutations" instead
of the Writable interface. Last time I did some benchmarks, I think Kryo was faster that Writable
too which would have another gain. I don't want to dirty up this ticket with scope-creep,
but I wanted to mention it publicly before I forgot again.

> Avoid copying all Mutations when using a TabletServerBatchWriter
> ----------------------------------------------------------------
>                 Key: ACCUMULO-2915
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0
>            Reporter: William Slacum
>             Fix For: 1.5.2, 1.6.1, 1.7.0
> Currently in the TabletServerBatchWriter, the following behavior is exhibited:
> {code}
>     // create a copy of mutation so that after this method returns the user
>     // is free to reuse the mutation object, like calling readFields... this
>     // is important for the case where a mutation is passed from map to reduce
>     // to batch writer... the map reduce code will keep passing the same mutation
>     // object into the reduce method
>     m = new Mutation(m);
>     totalMemUsed += m.estimatedMemoryUsed();
>     mutations.addMutation(table, m);
>     totalAdded++;
> {code}
> This means all data is copied twice when writing. The logic for doing this is a bit dubious,
since not all clients are going to be subject to MapReduce's use of references. 
> It'd be good if we provided users with a way of signaling that there's no need to copy
the mutation payload. [~elserj] suggested creating something akin to an {{ImmutableMutation}},
which help avoid some of the fears the batchwriter attempts to defend against.

This message was sent by Atlassian JIRA

View raw message