Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of weishung@gmail.com designates
 209.85.210.48 as permitted sender)
Subject: Re: how can i increase the number of mappers?
References: 
 <CACCzU1RCspvrNUa+9UeiZe4wSu4nZ4J5Q1kD5Nh=fHua+o1qbg@mail.gmail.com>
 <D8D46717-9EE3-473A-8355-95CFA9B8FFF5@gmail.com>
 <CACCzU1RnG-+05DNEosJsBLePBSFSEv1JKJ8cZC11-SBTfLqrwQ@mail.gmail.com>
 <CACCzU1SiSMtm91E56mcBeh+scP93kn+zgVBFnbs7aNC4YDoy9Q@mail.gmail.com>
From: Wei Shung Chung <weishung@gmail.com>
Content-Type: text/plain;
	charset=us-ascii
In-Reply-To: 
 <CACCzU1SiSMtm91E56mcBeh+scP93kn+zgVBFnbs7aNC4YDoy9Q@mail.gmail.com>
Message-Id: <EBC79D41-5096-483F-A826-247A3E7CBCB3@gmail.com>
Date: Wed, 21 Mar 2012 10:12:49 -0700
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (iPhone Mail 8C148)

Great info :)

Sent from my iPhone

On Mar 21, 2012, at 9:10 AM, Jane Wayne <jane.wayne2978@gmail.com> wrote:

> if anyone is facing the same problem, here's what i did. i took anil's
> advice to use NLineInputFormat (because that approach would scale out my
> mappers).
>=20
> however, i am using the new mapreduce package/API in hadoop v0.20.2. i
> notice that you cannot use NLineInputFormat from the old package/API
> (mapred).
>=20
> when i took a look at hadoop v1.0.1, there is a NLineInputFormat class for=

> the new API. i simply copied and pasted this file into my project. i got 4=

> errors associated with import statements and annotations. when i removed
> the 2 import statements and corresponding 2 annotations, the class compile=
d
> successfully. after this modification, running NLineInputFormat of v1.0.1
> on a cluster based on v0.20.2, works.
>=20
> one mini-problem solved, many more to go.
>=20
> thanks for the help.
>=20
> On Wed, Mar 21, 2012 at 3:33 AM, Jane Wayne <jane.wayne2978@gmail.com>wrot=
e:
>=20
>> as i understand, that class does not exist for new API in hadoop v0.20.2
>> (which is what i am using). if i am mistaken, where is it?
>>=20
>> i am looking at hadoop v1.0.1, and there is a NLineInputFormat class. i
>> wonder if i can simply copy/paste this into my project.
>>=20
>>=20
>> On Wed, Mar 21, 2012 at 2:37 AM, Anil Gupta <anilgupta84@gmail.com> wrote=
:
>>=20
>>> Have a look at NLineInputFormat class in Hadoop. That class will solve
>>> your purpose.
>>>=20
>>> Best Regards,
>>> Anil
>>>=20
>>> On Mar 20, 2012, at 11:07 PM, Jane Wayne <jane.wayne2978@gmail.com>
>>> wrote:
>>>=20
>>>> i have a matrix that i am performing operations on. it is 10,000 rows b=
y
>>>> 5,000 columns. the total size of the file is just under 30 MB. my HDFS
>>>> block size is set to 64 MB. from what i understand, the number of
>>> mappers
>>>> is roughly equal to the number of HDFS blocks used in the input. i.e.
>>> if my
>>>> input data spans 1 block, then only 1 mapper is created, if my data
>>> spans 2
>>>> blocks, then 2 mappers will be created, etc...
>>>>=20
>>>> so, with my 1 matrix file of 15 MB, this won't fill up a block of data,=

>>> and
>>>> being as such, only 1 mapper will be called upon the data. is this
>>>> understanding correct?
>>>>=20
>>>> if so, what i want to happen is for more than one mapper (let's say 10)=

>>> to
>>>> work on the data, even though it remains on 1 block. my analysis (or
>>>> map/reduce job) is such that +1 mappers can work on different parts of
>>> the
>>>> matrix. for example, mapper 1 can work on the first 500 rows, mapper 2
>>> can
>>>> work on the next 500 rows, etc... how can i set up multiple mappers (+1=

>>>> mapper) to work on a file that resides only one block (or a file whose
>>> size
>>>> is smaller than the HDFS block size).
>>>>=20
>>>> can i split the matrix into (let's say) 10 files? that will mean 30 MB
>>> / 10
>>>> =3D 3 MB per file. then put each 3 MB file onto HDFS ? will this increa=
se
>>> the
>>>> chance of having multiple mappers work simultaneously on the
>>> data/matrix?
>>>> if i can increase the number of mappers, i think (pretty sure) my
>>>> implementation will improve in speed linearly.
>>>>=20
>>>> any help is appreciated.
>>>=20
>>=20
>>=20