Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of lleung@ddn.com designates
 74.62.46.229 as permitted sender)
From: Leo Leung <lleung@ddn.com>
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Date: Fri, 11 May 2012 12:58:00 -0700
Subject: RE: Question on MapReduce
Thread-Topic: Question on MapReduce
Thread-Index: Ac0vrzhWiJLGDNIhQciPuNhWPE+0EQAAAX5Q
Message-ID: 
 <DF5E339AC1B1EB40B8C0E673B245B1DEA7B0FA64CA@MAILBOXCLUSTER.datadirect.datadirectnet.com>
References: 
 <CA+Omw9gwdJhA3HHyR4WTK=2Zpm3r4Chk867ReLQ_yXRU4UDZAg@mail.gmail.com>
	<DF5E339AC1B1EB40B8C0E673B245B1DEA7B0FA64C6@MAILBOXCLUSTER.datadirect.datadirectnet.com>
 <CA+Omw9gCKbOk8ERsH-orBd4oU5k+sPsistDWnthA5ejYKu6d0Q@mail.gmail.com>
In-Reply-To: 
 <CA+Omw9gCKbOk8ERsH-orBd4oU5k+sPsistDWnthA5ejYKu6d0Q@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0


This maybe dated materials.

Cloudera and HDP folks please correct with updates :)

http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-bas=
ic-hardware-recommendations/
http://www.cloudera.com/blog/2010/08/hadoophbase-capacity-planning/

http://hortonworks.com/blog/best-practices-for-selecting-apache-hadoop-hard=
ware/

Hope this helps.


-----Original Message-----
From: Satheesh Kumar [mailto:nkseam@gmail.com]=20
Sent: Friday, May 11, 2012 12:48 PM
To: common-user@hadoop.apache.org
Subject: Re: Question on MapReduce

Thanks, Leo. What is the config of a typical data node in a Hadoop cluster
- cores, storage capacity, and connectivity (SATA?).? How many tasktrackers=
 scheduled per core in general?

Is there a best practices guide somewhere?

Thanks,
Satheesh

On Fri, May 11, 2012 at 10:48 AM, Leo Leung <lleung@ddn.com> wrote:

> Nope, you must tune the config on that specific super node to have=20
> more M/R slots (this is for 1.0.x) This does not mean the JobTracker=20
> will be eager to stuff that super node with all the M/R jobs at hand.
>
> It still goes through the scheduler,  Capacity Scheduler is most=20
> likely what you have.  (check your config)
>
> IMO, If the data locality is not going to be there, your cluster is=20
> going to suffer from Network I/O.
>
>
> -----Original Message-----
> From: Satheesh Kumar [mailto:nkseam@gmail.com]
> Sent: Friday, May 11, 2012 9:51 AM
> To: common-user@hadoop.apache.org
> Subject: Question on MapReduce
>
> Hi,
>
> I am a newbie on Hadoop and have a quick question on optimal compute vs.
> storage resources for MapReduce.
>
> If I have a multiprocessor node with 4 processors, will Hadoop=20
> schedule higher number of Map or Reduce tasks on the system than on a=20
> uni-processor system? In other words, does Hadoop detect denser=20
> systems and schedule denser tasks on multiprocessor systems?
>
> If yes, will that imply that it makes sense to attach higher capacity=20
> storage to store more number of blocks on systems with dense compute?
>
> Any insights will be very useful.
>
> Thanks,
> Satheesh
>