nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tamas Palfy (Jira)" <j...@apache.org>
Subject [jira] [Updated] (NIFI-7527) AbstractKuduProcessor deadlocks after TGT refresh
Date Fri, 12 Jun 2020 15:35:00 GMT

     [ https://issues.apache.org/jira/browse/NIFI-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tamas Palfy updated NIFI-7527:
------------------------------
    Description: 
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos issue after
TGT expires) introduced a new bug: after TGT refresh the processor ends up in a deadlock.

The reason is that the onTrigger initiates a read lock:
{code:java}
    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession session) throws
ProcessException {
        kuduClientReadLock.lock();
        try {
            onTrigger(context, session, kuduClientR);
        } finally {
            kuduClientReadLock.unlock();
        }
    }
{code}
and while the read lock is in effect, later (in the same stack) - if TGT refresh occurs -
a write lock is attempted:
{code:java}
...
            public synchronized boolean checkTGTAndRelogin() throws LoginException {
                boolean didRelogin = super.checkTGTAndRelogin();

                if (didRelogin) {
                    createKuduClient(context);
                }

                return didRelogin;
            }
...

    protected void createKuduClient(ProcessContext context) {
        kuduClientWriteLock.lock();
        try {
            if (this.kuduClientR.get() != null) {
                try {
                    this.kuduClientR.get().close();
                } catch (KuduException e) {
                    getLogger().error("Couldn't close Kudu client.");
                }
            }

            if (kerberosUser != null) {
                final KerberosAction<KuduClient> kerberosAction = new KerberosAction<>(kerberosUser,
() -> buildClient(context), getLogger());
                this.kuduClientR.set(kerberosAction.execute());
            } else {
                this.kuduClientR.set(buildClient(context));
            }
        } finally {
            kuduClientWriteLock.unlock();
        }
    }
{code}
This attempt at the write lock will get stuck, waiting for the previous read lock to get released.
(Other threads may have acquired the same read lock but they can release it eventually - unless
they too try to acquire the write lock themselves.)

For the fix it seemed to be best to re-evalute the locking logic.
Previously basically the whole onTrigger logic was encapsulated in a read lock, including
the checking - and recreating as needed -  the Kudu client (as explained before).
It's best to just keep the actual privileged action in the read lock so the the refreshing
of the TGT and re-creation of the Kudu client can safely be done in a write lock before that.

  was:
The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos issue after
TGT expires) introduced a new bug: after TGT refresh the processor ends up in a deadlock.

The reason is that the onTrigger initiates a read lock:
{code:java}
    @Override
    public void onTrigger(final ProcessContext context, final ProcessSession session) throws
ProcessException {
        kuduClientReadLock.lock();
        try {
            onTrigger(context, session, kuduClientR);
        } finally {
            kuduClientReadLock.unlock();
        }
    }
{code}
and while the read lock is in effect, later (in the same stack) - if TGT refresh occurs -
a write lock is attempted:
{code:java}
...
            public synchronized boolean checkTGTAndRelogin() throws LoginException {
                boolean didRelogin = super.checkTGTAndRelogin();

                if (didRelogin) {
                    createKuduClient(context);
                }

                return didRelogin;
            }
...

    protected void createKuduClient(ProcessContext context) {
        kuduClientWriteLock.lock();
        try {
            if (this.kuduClientR.get() != null) {
                try {
                    this.kuduClientR.get().close();
                } catch (KuduException e) {
                    getLogger().error("Couldn't close Kudu client.");
                }
            }

            if (kerberosUser != null) {
                final KerberosAction<KuduClient> kerberosAction = new KerberosAction<>(kerberosUser,
() -> buildClient(context), getLogger());
                this.kuduClientR.set(kerberosAction.execute());
            } else {
                this.kuduClientR.set(buildClient(context));
            }
        } finally {
            kuduClientWriteLock.unlock();
        }
    }
{code}
This attempt at the write lock will get stuck, waiting for the previous read lock to get released.
(Other threads may have acquired the same read lock but they can release it eventually - unless
they too try to acquire the write lock themselves.)

The fix should be fairly simple: need to release the read lock before trying to acquire the
write lock.
Need to take care of the following though:
* Need to reacquire the read lock after the new Kudu client has been created because the current
logic calls an unlock on the ReadLock object later in a finally block. As this is a reentrant
lock, that means decreasing the counter for the number of locks. We need to make sure the
that number of locks and unlocks match.
* Need to make sure whenever we reference the Kudu client, it's actually the latest one. volatile
no longer suffice, we need to wrap it in an AtomicReference.


> AbstractKuduProcessor deadlocks after TGT refresh 
> --------------------------------------------------
>
>                 Key: NIFI-7527
>                 URL: https://issues.apache.org/jira/browse/NIFI-7527
>             Project: Apache NiFi
>          Issue Type: Bug
>            Reporter: Tamas Palfy
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The fix for https://issues.apache.org/jira/browse/NIFI-7453 (PutKudu kerberos issue after
TGT expires) introduced a new bug: after TGT refresh the processor ends up in a deadlock.
> The reason is that the onTrigger initiates a read lock:
> {code:java}
>     @Override
>     public void onTrigger(final ProcessContext context, final ProcessSession session)
throws ProcessException {
>         kuduClientReadLock.lock();
>         try {
>             onTrigger(context, session, kuduClientR);
>         } finally {
>             kuduClientReadLock.unlock();
>         }
>     }
> {code}
> and while the read lock is in effect, later (in the same stack) - if TGT refresh occurs
- a write lock is attempted:
> {code:java}
> ...
>             public synchronized boolean checkTGTAndRelogin() throws LoginException {
>                 boolean didRelogin = super.checkTGTAndRelogin();
>                 if (didRelogin) {
>                     createKuduClient(context);
>                 }
>                 return didRelogin;
>             }
> ...
>     protected void createKuduClient(ProcessContext context) {
>         kuduClientWriteLock.lock();
>         try {
>             if (this.kuduClientR.get() != null) {
>                 try {
>                     this.kuduClientR.get().close();
>                 } catch (KuduException e) {
>                     getLogger().error("Couldn't close Kudu client.");
>                 }
>             }
>             if (kerberosUser != null) {
>                 final KerberosAction<KuduClient> kerberosAction = new KerberosAction<>(kerberosUser,
() -> buildClient(context), getLogger());
>                 this.kuduClientR.set(kerberosAction.execute());
>             } else {
>                 this.kuduClientR.set(buildClient(context));
>             }
>         } finally {
>             kuduClientWriteLock.unlock();
>         }
>     }
> {code}
> This attempt at the write lock will get stuck, waiting for the previous read lock to
get released.
> (Other threads may have acquired the same read lock but they can release it eventually
- unless they too try to acquire the write lock themselves.)
> For the fix it seemed to be best to re-evalute the locking logic.
> Previously basically the whole onTrigger logic was encapsulated in a read lock, including
the checking - and recreating as needed -  the Kudu client (as explained before).
> It's best to just keep the actual privileged action in the read lock so the the refreshing
of the TGT and re-creation of the Kudu client can safely be done in a write lock before that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message