hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-13811) RBF: Race condition between router admin quota update and periodic quota update service
Date Fri, 23 Nov 2018 03:39:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-13811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16696359#comment-16696359
] 

Yiqun Lin commented on HDFS-13811:
----------------------------------

Thanks for the explanation, [~dibyendu_hadoop]. I think I have got your thought. I am still
reviewing, but some initial comments for you:

*RouterQuotaManager.java*
I'd like to keep original logic in {{getQuotaUsage}} and make that cleaned. We can allow usage
is not found within a short time and wait for quota periodic update behaviour. Also the logic
we change is incorrect, if we don't find the usage, it will get its parent usage until we
find the right one.

*RouterQuotaUpdateService.periodicInvoke(MountTable entry)*
Line84: I'd like to add a try-catch for {{periodicInvoke}} method. So that one mount table
updated error won't lead a loop exit.
Line92: Rename {{periodicInvoke}} to {{updateQuotaUsage}}.
Line124: {{currentQuotaUsage}} is an aggregated quota. The quota here (currentQuotaUsage.getQuota)
only mean the last subcluster's quota value not mean all sub-clusters. If one subcluster filesysem's
quota was changed, it still cannot be checked in following logic. Here I prefer to file another
JIRA to improve this and keep original logic temporary.
{code}
    // If there is a mismatch between the quota values in router cache
    // and sub-cluster file-system, sync the quota.
    if (currentQuotaUsage.getQuota() != nsQuota
        || currentQuotaUsage.getSpaceQuota() != ssQuota) {
      try {
        this.rpcServer.setQuota(src, nsQuota, ssQuota, null);
      } catch (IOException ioe) {
        LOG.error("Unable to set quota at remote location for " + src, ioe);
      }
    }
{code}
Line137: This line isn't needed.
Line176: In quota update service, we don't really need to use parameter {{updateQuotaCache}}.
Why not just set {{false}}. And no need to pass {{updateQuotaCache}} parameter.

Haven't fully reviewed the UT, but I think we need to add a new test case for quota cache
updating behaviour since we introduce the {{updateQuotaCache}} flag for mount table getting.

> RBF: Race condition between router admin quota update and periodic quota update service
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-13811
>                 URL: https://issues.apache.org/jira/browse/HDFS-13811
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Dibyendu Karmakar
>            Assignee: Dibyendu Karmakar
>            Priority: Major
>         Attachments: HDFS-13811-000.patch, HDFS-13811-HDFS-13891-000.patch
>
>
> If we try to update quota of an existing mount entry and at the same time periodic quota
update service is running on the same mount entry, it is leading the mount table to _inconsistent state._
> Here transactions are:
> A - Quota update service is fetching mount table entries.
> B - Quota update service is updating the mount table with current usage.
> A' - User is trying to update quota using admin cmd.
> and the transaction sequence is [ A A' B ]
> quota update service is updating the mount table with old quota value.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message