hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From souravm <SOUR...@infosys.com>
Subject RE: Any suggestion on performance improvement ?
Date Fri, 14 Nov 2008 19:12:08 GMT
Hi Alex,

I get 30-40 secs of response time for around 60MB of data. The number of Map and Reduce task
is 1 each. This is because the default HDFS block size is 64 MB and Pig assigns 1 Map task
for each HDFS block - I believe that is optimal.

Now this being the unit of performance even if I increase the number of node I don't think
the performance would be better.

Regards,
Sourav
-----Original Message-----
From: Alex Loddengaard [mailto:alex@cloudera.com] 
Sent: Friday, November 14, 2008 9:44 AM
To: core-user@hadoop.apache.org
Subject: Re: Any suggestion on performance improvement ?

How big is the data that you're loading and filtering?  Your cluster is
pretty small, so if you have data on the magnitude of tens or hundreds of
GBs, then the performance you're describing is probably to be expected.
How many map and reduce tasks are you running on each node?

Alex

On Thu, Nov 13, 2008 at 4:55 PM, souravm <SOURAVM@infosys.com> wrote:

> Hi,
>
> I'm testing with a 4 node setup of Hadoop hdfs.
>
> Each node has configuration of 2GB memory and dual core and around 30-60 GB
> disk space.
>
> I've kept files of different sizes in the hdfs ranging from 10MB to 5 GB.
>
> I'm querying those files using PIG. What I'm seeing that even a simple
> select query (LOAD and FILTER) is taking at least 30-40 sec of time. The MAP
> process in one node takes at least 25 sec.
>
> I've kept the jvm max heap size to 1024m.
>
> Any suggestion on how to improve the performance with different
> configuration at Hadoop level (by changing hdfs and MapReduce parameters) ?
>
> Regards,
> Sourav
>
> **************** CAUTION - Disclaimer *****************
> This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended
> solely
> for the use of the addressee(s). If you are not the intended recipient,
> please
> notify the sender by e-mail and delete the original message. Further, you
> are not
> to copy, disclose, or distribute this e-mail or its contents to any other
> person and
> any such actions are unlawful. This e-mail may contain viruses. Infosys has
> taken
> every reasonable precaution to minimize this risk, but is not liable for
> any damage
> you may sustain as a result of any virus in this e-mail. You should carry
> out your
> own virus checks before opening the e-mail or attachment. Infosys reserves
> the
> right to monitor and review the content of all messages sent to or from
> this e-mail
> address. Messages sent to or from this e-mail address may be stored on the
> Infosys e-mail system.
> ***INFOSYS******** End of Disclaimer ********INFOSYS***
>

Mime
View raw message