hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12291) [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the files under the given dir
Date Mon, 04 Sep 2017 14:57:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16152686#comment-16152686
] 

Rakesh R commented on HDFS-12291:
---------------------------------

Awesome work [~surendrasingh]. I've have few comments, please take care. Thanks!
# Rename few items:
{code}
{{pendingWork}} => {{pendingWorkCount}}
{{fullScanned}} => {{fullyScanned}}
{{queueCapacity}} => {{remainingCapacity}}
{code}
# Typo: please change {{re-encryption}}
{code}
BlockStorageMovementNeeded#processFileInode()
LOG.trace("Processing {} for re-encryption", inode.getFullPathName());
{code}
# Please add {{@InterfaceAudience.Private}} to {{FSTreeTraverser.java}}
# Please change priority to debug as this is frequently exec.
{code}
      LOG.info("StorageMovementNeeded queue remaining capacity is zero,"
          + " waiting for some free slots.");
{code}
# For safer side, please keep the condition {{pendingWork <= 0}}
{code}
    public synchronized boolean isDirWorkDone() {
      return (pendingWork == 0 && fullScanned);
    }
{code}
# Unused method, please remove.
{code}
    /**
     * Return pending work count for directory.
     */
    public synchronized int getPendingWork() {
      return pendingWork;
    }
{code}
# bq. I think current rate of consumption is low, SPS will take one by one and wait for 3sec.
Instead, we should take more elements from queue
Good catch [~umamaheswararao], I agree to add logic to increase the rate of consumption of
SPS tasks. Presently, SPS thread waiting period between each task submission is 3 secs. For
example, the remaining capacity is 1000 then presently SPS will take 3 * 1000 secs to schedule
1000 movement tasks. One imprv to {{#traverseDirInt()}} is to slice the {{remainingCapacity}}
to smaller internal batches like <=50 or <=100 each and do #submitCurrentBatch, again
smaller batch submission can be repeated until it submits the remainingCapacity number of
items to {{storageMovementNeeded}}?

> [SPS]: Provide a mechanism to recursively iterate and satisfy storage policy of all the
files under the given dir
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12291
>                 URL: https://issues.apache.org/jira/browse/HDFS-12291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, namenode
>            Reporter: Rakesh R
>            Assignee: Surendra Singh Lilhore
>         Attachments: HDFS-12291-HDFS-10285-01.patch, HDFS-12291-HDFS-10285-02.patch
>
>
> For the given source path directory, presently SPS consider only the files immediately
under the directory(only one level of scanning) for satisfying the policy. It WON’T do recursive
directory scanning and then schedules SPS tasks to satisfy the storage policy of all the files
till the leaf node. 
> The idea of this jira is to discuss & implement an efficient recursive directory
iteration mechanism and satisfies storage policy for all the files under the given directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message