hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-6800) FileInputFormat.singleThreadedListStatus to use listFiles(recursive)
Date Mon, 24 Oct 2016 14:40:00 GMT
Steve Loughran created MAPREDUCE-6800:

             Summary: FileInputFormat.singleThreadedListStatus to use listFiles(recursive)
                 Key: MAPREDUCE-6800
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6800
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
    Affects Versions: 2.7.3
            Reporter: Steve Loughran
            Priority: Minor

{{FileInputFormat.singleThreadedListStatus}} does recursive directory walks to pick files
to scan. This is very inefficient on object stores, and can be bypassed if {{listFiles(recursive=true)}}
can be used instead.

Based on the experience of SPARK-2984, it should also be resilient to a source file going
away during the iteration, downgrading an FNFE to a "skip that nonexistent path"

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message