accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William Slacum (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-2915) Avoid copying all Mutations when using a TabletServerBatchWriter
Date Tue, 17 Jun 2014 03:41:02 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033400#comment-14033400
] 

William Slacum commented on ACCUMULO-2915:
------------------------------------------

Yeah, I was aware of that. I'm not trying to share state. The current Mutation implementation
does a {{Mutation.ByteBuffer#toArray}} before creating the new Mutation which holds on to
the old references and then sets the Buffer to null. Looking at the code, this actually might
be problematic because:

{code}
  public Mutation(Mutation m) {
    m.serialize();
    this.row = m.row;
    this.data = m.data;
    this.entries = m.entries;
    this.values = m.values;
  }
{code}

ends up calling:

{code}
  private void serialize() {
   if (buffer != null) {
     data = buffer.toArray();
     buffer = null;
   }
 }
{code}

So now I'm my user, who should have a {{Mutation}} object I can still use. Let's try to call
{{Mutation#put}} on that bad boy...

{code}
  private void put(byte[] cf, int cfLength, byte[] cq, int cqLength, byte[] cv, boolean hasts,
long ts, boolean deleted, byte[] val, int valLength) {
    if (buffer == null) {
      throw new IllegalStateException("Can not add to mutation after serializing it");
    }
  ....
}
{code}

So I think Mutation may have an oddball workflow. A quick test confirms this:

{code}
    public static void main(String[] args) {
        Mutation m = new Mutation("row");
        m.put("cf", "cq1", "foo");
        // let's simulate what a batch scanner does
        new Mutation(m);
        // now let's try to reuse the original mutation
        try {
            m.put("cf", "cq2", "bar");
        } catch (IllegalStateException e) {
            System.err.println("VICTORY");
        }
    }
{code}

Notice how glorious the victory was, especially if that was run in Eclipse and the default
stderr output is red.

> Avoid copying all Mutations when using a TabletServerBatchWriter
> ----------------------------------------------------------------
>
>                 Key: ACCUMULO-2915
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2915
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: client
>    Affects Versions: 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0
>            Reporter: William Slacum
>             Fix For: 1.5.2, 1.6.1, 1.7.0
>
>
> Currently in the TabletServerBatchWriter, the following behavior is exhibited:
> {code}
>     // create a copy of mutation so that after this method returns the user
>     // is free to reuse the mutation object, like calling readFields... this
>     // is important for the case where a mutation is passed from map to reduce
>     // to batch writer... the map reduce code will keep passing the same mutation
>     // object into the reduce method
>     m = new Mutation(m);
>     
>     totalMemUsed += m.estimatedMemoryUsed();
>     mutations.addMutation(table, m);
>     totalAdded++;
> {code}
> This means all data is copied twice when writing. The logic for doing this is a bit dubious,
since not all clients are going to be subject to MapReduce's use of references. 
> It'd be good if we provided users with a way of signaling that there's no need to copy
the mutation payload. [~elserj] suggested creating something akin to an {{ImmutableMutation}},
which help avoid some of the fears the batchwriter attempts to defend against.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message