Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 23258 invoked from network); 26 Nov 2008 02:03:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Nov 2008 02:03:07 -0000 Received: (qmail 22505 invoked by uid 500); 26 Nov 2008 02:03:15 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 22478 invoked by uid 500); 26 Nov 2008 02:03:15 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 22467 invoked by uid 99); 26 Nov 2008 02:03:15 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Nov 2008 18:03:15 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Nov 2008 02:01:58 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 6C925234C29E for ; Tue, 25 Nov 2008 18:02:44 -0800 (PST) Message-ID: <1362634192.1227664964443.JavaMail.jira@brutus> Date: Tue, 25 Nov 2008 18:02:44 -0800 (PST) From: "dhruba borthakur (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-4565) MultiFileInputSplit can use data locality information to create splits In-Reply-To: <525654219.1225490984230.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dhruba borthakur updated HADOOP-4565: ------------------------------------- Status: Patch Available (was: Open) > MultiFileInputSplit can use data locality information to create splits > ---------------------------------------------------------------------- > > Key: HADOOP-4565 > URL: https://issues.apache.org/jira/browse/HADOOP-4565 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: CombineMultiFile.patch, CombineMultiFile2.patch, CombineMultiFile3.patch > > > The MultiFileInputFormat takes a set of paths and creates splits based on file sizes. Each splits contains a few files an each split are roughly equal in size. It would be efficient if we can extend this InputFormat to create splits such each all the blocks in one split and either node-local or rack-local. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.