nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-2395) PersistentProvenanceRepository Deadlocks caused by a blocked journal merge
Date Tue, 27 Sep 2016 19:55:21 GMT

    [ https://issues.apache.org/jira/browse/NIFI-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15527234#comment-15527234
] 

ASF GitHub Bot commented on NIFI-2395:
--------------------------------------

GitHub user mosermw opened a pull request:

    https://github.com/apache/nifi/pull/1072

    Nifi 2429 PersistentProvenanceRepository bug fixes

    In this PR I cherry-picked these commits from master into 0.x
    
    cfc8a9613cb071247ef22f8fe4a3abb4e6b83151 NIFI-2395 PersistentProvenanceRepository deadlock
on journal merge and index exception
    e9b87dd73436b1659b1fddcc400e7248bc00f1ee NIFI-2452 PersistentProvenanceRepository index
readers can be prematurely closed


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mosermw/nifi NIFI-2429

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/1072.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1072
    
----
commit 29460a724f583119eada146661ab654fc961c185
Author: Mark Payne <markap14@hotmail.com>
Date:   2016-07-28T14:19:45Z

    NIFI-2395 This closes #734. Ensure that if we fail to index provenance events we do not
prevent the repo from continuing to merge journals

commit bf8d66566c8eee911aea48b0b97942500851cf2c
Author: Mike Moser <mosermw@apache.org>
Date:   2016-09-26T20:22:50Z

    NIFI-2429 changes needed after cherry-picking NIFI-2395 from master

commit fba761508d0b1fd39e8d27e3e80a6d6e8e22c0cc
Author: Mark Payne <markap14@hotmail.com>
Date:   2016-08-01T18:51:02Z

    NIFI-2452: Ensure that we do not close Index Readers that are still in use

----


> PersistentProvenanceRepository Deadlocks caused by a blocked journal merge
> --------------------------------------------------------------------------
>
>                 Key: NIFI-2395
>                 URL: https://issues.apache.org/jira/browse/NIFI-2395
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 0.6.0, 0.7.0
>            Reporter: Brian Davis
>            Assignee: Joseph Witt
>            Priority: Blocker
>             Fix For: 1.0.0, 1.0.0-Beta
>
>
> I have a nifi instance that I have been running for about a week and has deadlocked at
least 3 times during this time.  When I say deadlock the whole nifi instance stops doing any
progress on flowfiles.  I looked at the stack trace and there are a lot of threads stuck doing
tasks in the PersistentProvenanceRepository.  Looking at the code I think this is what is
happening:
> There is a ReadWriteLock that all the reads are waiting for a write.  The write is in
the loop:
> {code}
>                 while (journalFileCount > journalCountThreshold || repoSize > sizeThreshold)
{
>                     // if a shutdown happens while we are in this loop, kill the rollover
thread and break
>                     if (this.closed.get()) {
>                         if (future != null) {
>                             future.cancel(true);
>                         }
>                         break;
>                     }
>                     if (repoSize > sizeThreshold) {
>                         logger.debug("Provenance Repository has exceeded its size threshold;
will trigger purging of oldest events");
>                         purgeOldEvents();
>                         journalFileCount = getJournalCount();
>                         repoSize = getSize(getLogFiles(), 0L);
>                         continue;
>                     } else {
>                         // if we are constrained by the number of journal files rather
than the size of the repo,
>                         // then we will just sleep a bit because another thread is already
actively merging the journals,
>                         // due to the runnable that we scheduled above
>                         try {
>                             Thread.sleep(100L);
>                         } catch (final InterruptedException ie) {
>                         }
>                     }
>                     logger.debug("Provenance Repository is still behind. Keeping flow
slowed down "
>                             + "to accommodate. Currently, there are {} journal files
({} bytes) and "
>                             + "threshold for blocking is {} ({} bytes)", journalFileCount,
repoSize, journalCountThreshold, sizeThreshold);
>                     journalFileCount = getJournalCount();
>                     repoSize = getSize(getLogFiles(), 0L);
>                 }
>                 logger.info("Provenance Repository has now caught up with rolling over
journal files. Current number of "
>                         + "journal files to be rolled over is {}", journalFileCount);
>             }
> {code}
> My nifi is at the sleep indefinitely.  The reason my nifi cannot move forward is because
of the thread doing the merge is stopped.  The thread doing the merge is at:
> {code}
> accepted = eventQueue.offer(new Tuple<>(record, blockIndex), 10, TimeUnit.MILLISECONDS);
> {code}
> so the queue is full.  
> What I believe happened is that the callables created here:
> {code}
>                             final Callable<Object> callable = new Callable<Object>()
{
>                                 @Override
>                                 public Object call() throws IOException {
>                                     while (!eventQueue.isEmpty() || !finishedAdding.get())
{
>                                         final Tuple<StandardProvenanceEventRecord,
Integer> tuple;
>                                         try {
>                                             tuple = eventQueue.poll(10, TimeUnit.MILLISECONDS);
>                                         } catch (final InterruptedException ie) {
>                                             continue;
>                                         }
>                                         if (tuple == null) {
>                                             continue;
>                                         }
>                                         indexingAction.index(tuple.getKey(), indexWriter,
tuple.getValue());
>                                     }
>                                     return null;
>                                 }
> {code}
> finish before the offer adds its first event because I do not see any Index Provenance
Events threads.  My guess is the while loop condition is wrong and should be && instead
of ||.
> I upped the thread count for the index creation from 1 to 3 to see if that helps.  I
can tell you if that helps later this week.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message