Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@apache.org
Date: Tue, 10 Jun 2014 14:58:02 +0000 (UTC)
From: "Keith Turner (JIRA)" <jira@apache.org>
To: notifications@accumulo.apache.org
Message-ID: <JIRA.12711601.1398895865270.101808.1402412282889@arcas>
In-Reply-To: <JIRA.12711601.1398895865270@arcas>
References: <JIRA.12711601.1398895865270@arcas>
Subject: [jira] [Commented] (ACCUMULO-2766) Single walog operation may wait
 for multiple hsync calls
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/ACCUMULO-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026532#comment-14026532 ] 

Keith Turner commented on ACCUMULO-2766:
----------------------------------------

bq. CI + Agitation is a pretty high bar for minimally ensuring that something works as intended

I can not really think of another test that would give me the confidence that this code works.  I will ensure the test is run before 1.6.1 or 1.5.2 is released.

bq. What assumptions did the extra locking provide us with?

I am not sure why the locking around sync was added.  There was a race condition in close() that I fixed in the 2nd patch.  Maybe this was observed in CI testing and the locking around sync was a work around for it.  I am only guessing though, I did not track all of the changes from ACCUMULO-119 to now (I tried and gave up svn, renames, merges, etc).   The purpose of {{closeLock}} is to ensure nothing is added to the queue after the walog is closed.

bq. Initially it looks like we locked to offer work to the syncQueue, but did not lock to poll? And now we do not lock for either?

The queue is a [LinkedBlockingQueue|http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/LinkedBlockingQueue.html] which is thread safe.  We do not need to synchronize its use.  

> Single walog operation may wait for multiple hsync calls
> --------------------------------------------------------
>
>                 Key: ACCUMULO-2766
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2766
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.5.0, 1.5.1, 1.6.0
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>            Priority: Critical
>              Labels: performance
>             Fix For: 1.5.2, 1.6.1, 1.7.0
>
>         Attachments: ACCUMULO-2677-1.patch, ACCUMULO-2766-2.patch
>
>
> While looking into slow {{hsync}} calls, I noticed an oddity in the way Accumulo processes syncs.  Specifically the way {{closeLock}} is used in {{DfsLogger}}, it seems like the following situation could occur. 
>  
>  # thread B starts executing DfsLogger.LogSyncingTask.run()
>  # thread 1 enters DfsLogger.logFileData()
>  # thread 1 writes to walog
>  # thread 1 locks _closeLock_ 
>  # thread 1 adds sync work to workQueue
>  # thread 1 unlocks _closeLock_
>  # thread B takes sync work off of workQueue
>  # thread B locks _closeLock_
>  # thread B calls sync
>  # thread 3 enters DfsLogger.logFileData()
>  # thread 3 writes to walog
>  # thread 3 blocks locking _closeLock_
>  # thread 4 enters DfsLogger.logFileData()
>  # thread 4 writes to walog
>  # thread 4 blocks locking _closeLock_
>  # thread B unlocks _closeLock_
>  # thread 4 locks _closeLock_ 
>  # thread 4 adds sync work to workQueue
>  # thread B takes sync work off of workQueue
>  # thread B blocks locking _closeLock_
>  # thread 4 unlocks _closeLock_
>  # thread B locks _closeLock_
>  # thread B calls sync
>  # thread B unlocks _closeLock_
>  # thread 3 locks _closeLock_
>  # thread 3 adds sync work to workQueue
>  # thread 3 unlocks _closeLock_
> In this situation thread 3 unnecessarily has to wait for an extra {{hsync}} call.  Not sure if this situation actually occurs, or if it occurs very frequently.  Looking at the code it seems like it would be nice if sync operations could be queued w/o synchronizing w/ sync operations.


--
This message was sent by Atlassian JIRA
(v6.2#6252)