Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of maha@umail.ucsb.edu designates
 128.111.151.62 as permitted sender)
From: maha <maha@umail.ucsb.edu>
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: multipart/alternative; boundary=Apple-Mail-6-1007637539
Subject: Re: Quick Question: LineSplit or BlockSplit
Date: Mon, 7 Feb 2011 21:20:17 -0800
In-Reply-To: <AANLkTik2ZLF7RiWg5xcsjDQUp1vaeCZ-VXuWO3sxEKz8@mail.gmail.com>
To: common-user@hadoop.apache.org
References: <A9640EAE-A188-4B0B-A55E-63E4F386BCEB@umail.ucsb.edu>
 <AANLkTimrB8K3joM2gWAZQtmJ0OzLJwQ2XCDnK58kQa7p@mail.gmail.com>
 <AANLkTin7ehce2VS-b6Wy2hZpFOVBe_94R0K_THK+Lpje@mail.gmail.com>
 <AANLkTikQyyUH8jEcx_NVao4260fLVM98L6GJfAtSP85_@mail.gmail.com>
 <AANLkTik2ZLF7RiWg5xcsjDQUp1vaeCZ-VXuWO3sxEKz8@mail.gmail.com>
Message-Id: <CE1CAAF5-8FC5-4063-99E9-377F7108F4C3@umail.ucsb.edu>

--Apple-Mail-6-1007637539
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=us-ascii

Thanks Ted. Then I have to write my own InputFormat to read a =
block-of-lines per mapper.
=20
 NLineInputFormat didn't work with me, any working example about it is =
appreciate it.

Thanks again,

Maha


On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote:

> Thanks!
> Mark
>=20
> On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <tdunning@maprtech.com> =
wrote:
>=20
>> That is quite doable.  One way to do it is to make the max split size =
quite
>> small.
>>=20
>> On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <markkerzner@gmail.com>
>> wrote:
>>=20
>>> Ted,
>>>=20
>>> I am also interested in this answer.
>>>=20
>>> I put the name of a zip file on a line in an input file, and I want =
one
>>> mapper to read this line, and start working on it (since it now =
knows the
>>> path in HDFS). Are you saying it's not doable?
>>>=20
>>> Thank you,
>>> Mark
>>>=20
>>> On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <tdunning@maprtech.com>
>> wrote:
>>>=20
>>>> Option (1) isn't the way that things normally work.  Besides, =
mappers
>> are
>>>> called many times for each construction of a mapper.
>>>>=20
>>>> On Mon, Feb 7, 2011 at 3:38 PM, maha <maha@umail.ucsb.edu> wrote:
>>>>=20
>>>>> Hi,
>>>>>=20
>>>>> I would appreciate it if you could give me your thoughts if there =
is
>>>>> affect on efficiency if:
>>>>>=20
>>>>> 1) Mappers were per line in a document
>>>>>=20
>>>>> or
>>>>>=20
>>>>> 2) Mappers were per block of lines in a document.
>>>>>=20
>>>>>=20
>>>>> I know the obvious difference I can see is that (1) has more
>> mappers.
>>>> Does
>>>>> that mean (1) will be slower because of scheduling time ?
>>>>>=20
>>>>> Thank you,
>>>>> Maha
>>>>>=20
>>>>=20
>>>=20
>>=20


--Apple-Mail-6-1007637539--