accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ACCUMULO-1950) Reduce the number of calls to hsync
Date Sun, 14 Sep 2014 22:20:34 GMT

    [ https://issues.apache.org/jira/browse/ACCUMULO-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14133421#comment-14133421
] 

Josh Elser commented on ACCUMULO-1950:
--------------------------------------

bq. And tserver.mutation.queue.max was deprecated for tserver.total.mutation.queue.max, which
is a new properly

Yeah, this is what I meant. It wasn't entirely obvious to me how the new property differed
from the old (w/o reading code), most notably the interactions of the old property with the
new. Given your comment about not blowing out the JVM, is this not as "critical" for users
to set correctly for performance reasons as the old property was?

> Reduce the number of calls to hsync 
> ------------------------------------
>
>                 Key: ACCUMULO-1950
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1950
>             Project: Accumulo
>          Issue Type: Improvement
>            Reporter: Keith Turner
>            Assignee: Eric Newton
>             Fix For: 1.7.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> As mutations written to a tablet server its buffered and once this buffer exceeds a certain
size the data is dumped to the walog and then inserted into an in memory sorted map.   These
walog buffers are per a client and the max size is determined by tserver.mutation.queue.max.
 
> Accumulo 1.5 and 1.6 call hsync() in hadoop 2 which ensures data is flushed to disk.
  This introduces a fixed delay when flushing walog buffers.  The smaller tserver.mutation.queue.max
is, the more frequently the walog buffers are flushed.   With many clients writing to a tserver,
this is not much of a concern because all of their walog buffers are flushed using group commit.
 This results in high throughput because large batches of data being written before hsync
is called.  However if a few client writing to a tserver there will be a lot more calls to
hsync.  It would be nice the # of calls to hsync was a function of the amount of data written
regardless of the number of concurrent clients.  Currently as the number of concurrent clients
goes down, the number of calls to hsync goes up.
> In 1.6 and 1.5 this can be mitigated by increasing tserver.mutation.queue.max, however
this is multiplied by the number of concurrent writers.  So increasing it can improve performance
of a single writer but increases the chances of many concurrent writers exhausting memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message