hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Vadali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-2167) Faster directory traversal for raid node
Date Tue, 09 Nov 2010 22:38:26 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ramkumar Vadali updated MAPREDUCE-2167:
---------------------------------------

    Attachment: MAPREDUCE-2167.4.patch

Fixed a broken test.

TEST RESULTS:


ant test-patch has the same number of failures as a clean checkout

{code}
     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 4 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler
warnings.
     [exec]
     [exec]     -1 findbugs.  The patch appears to introduce 13 new Findbugs warnings.
     [exec]
     [exec]     -1 release audit.  The applied patch generated 2 release audit warnings (more
than the trunk's current 1 warnings).
     [exec]
     [exec]     +1 system test framework.  The patch passed system test framework compile.
     [exec]
     [exec]
     [exec]
     [exec]
     [exec] ======================================================================
     [exec] ======================================================================
     [exec]     Finished build.
     [exec] ======================================================================
     [exec] ======================================================================
     [exec]
     [exec]
{code}

ant test succeeds:

{code}


test-junit:
    [junit] WARNING: multiple versions of ant detected in path for junit
    [junit]          jar:file:/home/rvadali/local/external/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
    [junit]      and jar:file:/home/rvadali/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
    [junit] Running org.apache.hadoop.hdfs.TestRaidDfs
    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 47.071 sec
    [junit] Running org.apache.hadoop.raid.TestBlockFixer
    [junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 124.583 sec
    [junit] Running org.apache.hadoop.raid.TestDirectoryTraversal
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 9.337 sec
    [junit] Running org.apache.hadoop.raid.TestErasureCodes
    [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.481 sec
    [junit] Running org.apache.hadoop.raid.TestGaloisField
    [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.392 sec
    [junit] Running org.apache.hadoop.raid.TestHarIndexParser
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.052 sec
    [junit] Running org.apache.hadoop.raid.TestRaidFilter
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.485 sec
    [junit] Running org.apache.hadoop.raid.TestRaidHar
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 71.136 sec
    [junit] Running org.apache.hadoop.raid.TestRaidNode
    [junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 471.072 sec
    [junit] Running org.apache.hadoop.raid.TestRaidPurge
    [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 107.828 sec
    [junit] Running org.apache.hadoop.raid.TestRaidShell
    [junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 25.714 sec

test:

BUILD SUCCESSFUL
Total time: 15 minutes 6 seconds
{code}


> Faster directory traversal for raid node
> ----------------------------------------
>
>                 Key: MAPREDUCE-2167
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2167
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
>         Attachments: MAPREDUCE-2167.2.patch, MAPREDUCE-2167.3.patch, MAPREDUCE-2167.4.patch,
MAPREDUCE-2167.patch
>
>
> The RaidNode currently iterates over the directory structure to figure out which files
to RAID. With millions of files, this can take a long time - especially if some files are
already RAIDed and the RaidNode needs to look at parity files / parity file HARs to determine
if the file needs to be RAIDed.
> The directory traversal is encapsulated inside the class DirectoryTraversal, which examines
one file at a time, using the caller's thread.
> My proposal is to make this multi-threaded as follows:
>  * use a pool of threads inside DirectoryTraversal
>  * The caller's thread is used to retrieve directories, and each new directory is assigned
to a thread in the pool. The worker thread examines all the files the directory.
>  * If there sub-directories, those are added back as workitems to the pool.
> Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message