hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Chen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2167) Faster directory traversal for raid node
Date Mon, 08 Nov 2010 20:42:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12929720#action_12929720
] 

Scott Chen commented on MAPREDUCE-2167:
---------------------------------------

+1 Looks good to me.
Just one more thing, can you add some comments explaining the motivation of using the semaphore?
It is confusing when you are using both the thread pool and semaphore.

> Faster directory traversal for raid node
> ----------------------------------------
>
>                 Key: MAPREDUCE-2167
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>         Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.patch
>
>
> The RaidNode currently iterates over the directory structure to figure out which files
to RAID. With millions of files, this can take a long time - especially if some files are
already RAIDed and the RaidNode needs to look at parity files / parity file HARs to determine
if the file needs to be RAIDed.
> The directory traversal is encapsulated inside the class DirectoryTraversal, which examines
one file at a time, using the caller's thread.
> My proposal is to make this multi-threaded as follows:
>  * use a pool of threads inside DirectoryTraversal
>  * The caller's thread is used to retrieve directories, and each new directory is assigned
to a thread in the pool. The worker thread examines all the files the directory.
>  * If there sub-directories, those are added back as workitems to the pool.
> Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message