Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BD7411EBF for ; Mon, 16 Jun 2014 23:11:02 +0000 (UTC) Received: (qmail 39086 invoked by uid 500); 16 Jun 2014 23:11:02 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 39039 invoked by uid 500); 16 Jun 2014 23:11:02 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 39017 invoked by uid 99); 16 Jun 2014 23:11:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 16 Jun 2014 23:11:02 +0000 Date: Mon, 16 Jun 2014 23:11:02 +0000 (UTC) From: "Sean Busbey (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-2915) Avoid copying all Mutations when using a TabletServerBatchWriter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-2915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033151#comment-14033151 ] Sean Busbey commented on ACCUMULO-2915: --------------------------------------- While I also don't want to clutter this ticket, I would like to point out that we currently have *lots* of serialization libraries in use, almost all of which could be replaced with Avro. Barring a major performance differentiator (on the order of 2x+), such a change would greatly simplify our long term maintenance. So if we do make any additional changes in serialization points, please make them pluggable so that it's easier to do comparisons and consolidation. > Avoid copying all Mutations when using a TabletServerBatchWriter > ---------------------------------------------------------------- > > Key: ACCUMULO-2915 > URL: https://issues.apache.org/jira/browse/ACCUMULO-2915 > Project: Accumulo > Issue Type: Improvement > Components: client > Affects Versions: 1.5.0, 1.5.1, 1.6.0, 1.6.1, 1.7.0 > Reporter: William Slacum > Fix For: 1.5.2, 1.6.1, 1.7.0 > > > Currently in the TabletServerBatchWriter, the following behavior is exhibited: > {code} > // create a copy of mutation so that after this method returns the user > // is free to reuse the mutation object, like calling readFields... this > // is important for the case where a mutation is passed from map to reduce > // to batch writer... the map reduce code will keep passing the same mutation > // object into the reduce method > m = new Mutation(m); > > totalMemUsed += m.estimatedMemoryUsed(); > mutations.addMutation(table, m); > totalAdded++; > {code} > This means all data is copied twice when writing. The logic for doing this is a bit dubious, since not all clients are going to be subject to MapReduce's use of references. > It'd be good if we provided users with a way of signaling that there's no need to copy the mutation payload. [~elserj] suggested creating something akin to an {{ImmutableMutation}}, which help avoid some of the fears the batchwriter attempts to defend against. -- This message was sent by Atlassian JIRA (v6.2#6252)