Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
MIME-Version: 1.0
References: 
 <CALnYFoKizuGAOTye8K8srb7s61zmjJxE5AVhCgo1+J4e1t4c-w@mail.gmail.com>
 <CAMm20=7wFWpbA0xJ2D5xGwxb1LnBbg9OESu0uSRqt7pZFhKDSA@mail.gmail.com>
In-Reply-To: 
 <CAMm20=7wFWpbA0xJ2D5xGwxb1LnBbg9OESu0uSRqt7pZFhKDSA@mail.gmail.com>
From: Niels Basjes <Niels@basjes.nl>
Date: Thu, 30 Jul 2015 17:02:11 +0000
Message-ID: 
 <CADoiZqo+w0SU60bGvBvV4NtzFtSr0g747_8TJ4umkia1MiKeEw@mail.gmail.com>
Subject: Re: Sorting the inputSplits
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=f46d04440408a55db2051c1aaa67

--f46d04440408a55db2051c1aaa67
Content-Type: text/plain; charset=UTF-8

MapReduce is based on the premise that several parts of a task can be
processed independently in parallel.
If you "require" an order of processing then these files are depending on
each other. Why use MapReduce at all?
With your requirement you cannot use more than one CPU anyway.

Niels

On Thu, 30 Jul 2015 01:31 Gera Shegalov <gera@shegalov.com> wrote:

> Can you clarify the requirement "processed first"? Maps run in parallel
> without any ordering guarantees. If you want to affect the mapping
> file->split number, you can implement your own getSplits in the custom
> input format and return splits ordered anyway your like.
>
> On Wed, Jul 22, 2015 at 12:06 PM, Nishanth S <chinchu2884@gmail.com>
> wrote:
>
>> Hey folks,
>>
>> Is their a way to sort the input splits  in map reduce.We have a case
>> where there are two files file1 and file2 in the input directory.Since we
>>  have custominputformat which   has issplittable return false always each
>> of  these files would be processed  by  a different mapper.How could I make
>> sure that  file1 is processed   before  file2(I want the oldest file to  be
>> processed first).Is this possible?.
>>
>> Thanks,
>> Nishan
>>
>
>

--f46d04440408a55db2051c1aaa67
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">MapReduce is based on the premise that several parts of a ta=
sk can be processed independently in parallel.<br>
If you &quot;require&quot; an order of processing then these files are depe=
nding on each other. Why use MapReduce at all?<br>
With your requirement you cannot use more than one CPU anyway.</p>
<p dir=3D"ltr">Niels</p>
<br><div class=3D"gmail_quote"><div dir=3D"ltr">On Thu, 30 Jul 2015 01:31=
=C2=A0Gera Shegalov &lt;<a href=3D"mailto:gera@shegalov.com">gera@shegalov.=
com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"=
>Can you clarify the requirement &quot;processed first&quot;? Maps run in p=
arallel without any ordering guarantees. If you want to affect the mapping =
file-&gt;split number, you can implement your own getSplits in the custom i=
nput format and return splits ordered anyway your like.=C2=A0</div><div cla=
ss=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Jul 22, 2015 at 1=
2:06 PM, Nishanth S <span dir=3D"ltr">&lt;<a href=3D"mailto:chinchu2884@gma=
il.com" target=3D"_blank">chinchu2884@gmail.com</a>&gt;</span> wrote:<br><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr">Hey folks,<div><br></div><div=
>Is their a way to sort the input splits =C2=A0in map reduce.We have a case=
 where there are two files file1 and file2 in the input directory.Since we =
=C2=A0have custominputformat which =C2=A0 has issplittable return false alw=
ays each of =C2=A0these files would be processed =C2=A0by =C2=A0a different=
 mapper.How could I make sure that =C2=A0file1 is processed =C2=A0 before =
=C2=A0file2(I want the oldest file to =C2=A0be processed first).Is this pos=
sible?.</div><div><br></div><div>Thanks,</div><div>Nishan</div></div>
</blockquote></div><br></div>
</blockquote></div>

--f46d04440408a55db2051c1aaa67--