crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <j...@apache.org>
Subject [jira] [Created] (CRUNCH-256) SequentialFileNamingScheme should cache the # of files in the target directory after the first read
Date Fri, 23 Aug 2013 00:11:52 GMT
Josh Wills created CRUNCH-256:
---------------------------------

             Summary: SequentialFileNamingScheme should cache the # of files in the target
directory after the first read
                 Key: CRUNCH-256
                 URL: https://issues.apache.org/jira/browse/CRUNCH-256
             Project: Crunch
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Josh Wills
             Fix For: 0.8.0


After a job finishes running, the post-job hooks rename the files from a temp output directory
to the target output directory. When we have lots of files, this move can take a long time,
and I traced the performance issue to the fact that SequentialFileNamingScheme does a listStatus()
on the output directory for every file that gets moved. If SequentialFileNamingScheme just
does this check once and then increments an internal counter, we can significantly decrease
the performance overhead involved with the move.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message