Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A0A10180A9 for ; Thu, 30 Jul 2015 17:04:06 +0000 (UTC) Received: (qmail 78350 invoked by uid 500); 30 Jul 2015 17:04:01 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 78210 invoked by uid 500); 30 Jul 2015 17:04:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 78200 invoked by uid 99); 30 Jul 2015 17:04:00 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 17:04:00 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 64FD01A92D1 for ; Thu, 30 Jul 2015 17:04:00 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.002 X-Spam-Level: **** X-Spam-Status: No, score=4.002 tagged_above=-999 required=6.31 tests=[HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=3, KAM_LAZY_DOMAIN_SECURITY=1, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 7W_R-eZy9uDQ for ; Thu, 30 Jul 2015 17:03:54 +0000 (UTC) Received: from mail-wi0-f172.google.com (mail-wi0-f172.google.com [209.85.212.172]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id D4C73210DF for ; Thu, 30 Jul 2015 17:03:53 +0000 (UTC) Received: by wicgb10 with SMTP id gb10so324057wic.1 for ; Thu, 30 Jul 2015 10:02:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-type; bh=dTUS2G4hAZD8Havd6TPFAOfc/YuQrLfDbJdYgL9MyDY=; b=hVKgJ1AmzVZcZdwVyiD0D6ggsywpYAN0W/NhumoAS0Ej737F5raqZv+SlfwVsACd4m A+YagGqJdV9pLCYKOUmqhHDY9z0/IwZ3xf3hhCTFbKlz+jg4+6sXl2K9msAoxQDPeoCt 2wqEcrYObPDu3BWAD+RHmNq4N/yh74batLT2ySTWcHZLYCFLh3asyZ0OHHi05tQTk/rU 9deKG94lmC7izUKip2zF/Oy6Fz/aC7iVMjkHg3I5nsXMu8CLFBjs8U+E/OfUyKU/qEIt ylXJjsGJYp/m1OZuZMsZw4Oj+Chu1kg476OsSnPciE+8As/5MW6APdMYYioxdDB10KiE bMVw== X-Gm-Message-State: ALoCoQld7RX1qajcBxhFqT6joLjHapNb6OoJG94VAB4dOH/1unXBut6AMZT/oMogeyNs0dwjqQp9 X-Received: by 10.180.82.230 with SMTP id l6mr7873156wiy.61.1438275741237; Thu, 30 Jul 2015 10:02:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Niels Basjes Date: Thu, 30 Jul 2015 17:02:11 +0000 Message-ID: Subject: Re: Sorting the inputSplits To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d04440408a55db2051c1aaa67 --f46d04440408a55db2051c1aaa67 Content-Type: text/plain; charset=UTF-8 MapReduce is based on the premise that several parts of a task can be processed independently in parallel. If you "require" an order of processing then these files are depending on each other. Why use MapReduce at all? With your requirement you cannot use more than one CPU anyway. Niels On Thu, 30 Jul 2015 01:31 Gera Shegalov wrote: > Can you clarify the requirement "processed first"? Maps run in parallel > without any ordering guarantees. If you want to affect the mapping > file->split number, you can implement your own getSplits in the custom > input format and return splits ordered anyway your like. > > On Wed, Jul 22, 2015 at 12:06 PM, Nishanth S > wrote: > >> Hey folks, >> >> Is their a way to sort the input splits in map reduce.We have a case >> where there are two files file1 and file2 in the input directory.Since we >> have custominputformat which has issplittable return false always each >> of these files would be processed by a different mapper.How could I make >> sure that file1 is processed before file2(I want the oldest file to be >> processed first).Is this possible?. >> >> Thanks, >> Nishan >> > > --f46d04440408a55db2051c1aaa67 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

MapReduce is based on the premise that several parts of a ta= sk can be processed independently in parallel.
If you "require" an order of processing then these files are depe= nding on each other. Why use MapReduce at all?
With your requirement you cannot use more than one CPU anyway.

Niels


On Thu, 30 Jul 2015 01:31= =C2=A0Gera Shegalov <gera@shegalov.= com> wrote:
Can you clarify the requirement "processed first"? Maps run in p= arallel without any ordering guarantees. If you want to affect the mapping = file->split number, you can implement your own getSplits in the custom i= nput format and return splits ordered anyway your like.=C2=A0

On Wed, Jul 22, 2015 at 1= 2:06 PM, Nishanth S <chinchu2884@gmail.com> wrote:
Hey folks,

Is their a way to sort the input splits =C2=A0in map reduce.We have a case= where there are two files file1 and file2 in the input directory.Since we = =C2=A0have custominputformat which =C2=A0 has issplittable return false alw= ays each of =C2=A0these files would be processed =C2=A0by =C2=A0a different= mapper.How could I make sure that =C2=A0file1 is processed =C2=A0 before = =C2=A0file2(I want the oldest file to =C2=A0be processed first).Is this pos= sible?.

Thanks,
Nishan

--f46d04440408a55db2051c1aaa67--