Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B1441852E for ; Sat, 2 May 2015 04:40:09 +0000 (UTC) Received: (qmail 35938 invoked by uid 500); 2 May 2015 04:40:08 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 35880 invoked by uid 500); 2 May 2015 04:40:08 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 35745 invoked by uid 99); 2 May 2015 04:40:08 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 May 2015 04:40:08 +0000 Date: Sat, 2 May 2015 04:40:08 +0000 (UTC) From: "Hadoop QA (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5907) Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14524788#comment-14524788 ] Hadoop QA commented on MAPREDUCE-5907: -------------------------------------- \\ \\ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | patch | 0m 0s | The patch command could not apply the patch during dryrun. | \\ \\ || Subsystem || Report/Notes || | Patch URL | http://issues.apache.org/jira/secure/attachment/12648040/MAPREDUCE-5907-3.patch | | Optional Tests | javadoc javac unit findbugs checkstyle | | git revision | trunk / f1a152c | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/5543/console | This message was automatically generated. > Improve getSplits() performance for fs implementations that can utilize performance gains from recursive listing > ---------------------------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-5907 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5907 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client > Affects Versions: 2.4.0 > Reporter: Sumit Kumar > Assignee: Sumit Kumar > Attachments: MAPREDUCE-5907-2.patch, MAPREDUCE-5907-3.patch, MAPREDUCE-5907.patch > > > FileInputFormat (both mapreduce and mapred implementations) use recursive listing while calculating splits. They however do this by doing listing level by level. That means to discover files in /foo/bar means they do listing at /foo/bar first to get the immediate children, then make the same call on all immediate children for /foo/bar to discover their immediate children and so on. This doesn't scale well for object store based fs implementations like s3 and swift because every listStatus call ends up being a webservice call to backend. In cases where large number of files are considered for input, this makes getSplits() call slow. > This patch adds a new set of recursive list apis that gives opportunity to the fs implementations to optimize. The behavior remains the same for other implementations (that is a default implementation is provided for other fs so they don't have to implement anything new). However for objectstore based fs implementations it provides a simple change to include recursive flag as true (as shown in the patch) to improve listing performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332)