hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Goryunov <a.goryu...@gmail.com>
Subject Re: Distributed table processing is slower that local table processing
Date Fri, 30 Mar 2012 08:35:59 GMT
 Hi Anil,

Yes, the second table is distributed, the first is not and I have 3х better
results for nondistrubuted table.

I use distributed hadoop mode for all cases.

Thanks.



On Fri, Mar 30, 2012 at 3:26 AM, anil gupta <anilgupt@buffalo.edu> wrote:

> Hi Alexander,
>
> Is data properly distributed over the cluster in Distributed Mode? If the
> data is not then you wont get good results in distributed mode.
>
> Thanks,
> Anil Gupta
>
> On Thu, Mar 29, 2012 at 8:37 AM, Alexander Goryunov <a.goryunov@gmail.com
> >wrote:
>
> > Hello,
> >
> > I'm running 3 data node cluster (8core Xeon, 16G) + 1 node for jobtracker
> > and namenode with Hadoop and HBase and have strange performance results.
> >
> > The same map job runs with speed about 300 000 records per second for 1
> > node table and 100 000 records per second for table  distributed to 3
> > nodes.
> >
> > Scan caching is 1000, each row is about 0.2K, compression is off,
> > setCacheBlock is false.
> >
> > 7 map tasks in parallel for each node. (281 for the big table in summary
> > and 16 for the small table)
> >
> > Map job reads some sequential data and writes down a few from it. No
> reduce
> > tasks are set for this job.
> >
> >
> > Both table have the same data and have sizes about 10M (first one)
> records
> > and 150M (second one) records.
> >
> > Do you have any idea what could be the reason of such behavior?
> >
> > Thanks.
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message