hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tim robertson <timrobertson...@gmail.com>
Subject Re: Map Reduce performance
Date Tue, 23 Jun 2009 15:30:28 GMT
Hi Ramesh

I'm not sure it is really meaningful to try and draw conclusions about
performance running on only node as you don't gain any benefits of
parallelisation.  You might be better trying with a small cluster of
say 4 nodes in Amazon EC2, and then trying the same with say 8 nodes
and trying to draw some conclusions about increased cluster size
yielding better performance, which is presumably the proof you are
really looking for - e.g. proving that you can grow in data volume and
performance with increased hardware.

I think the MR will work much better with more nodes as you have more
clients doing inserts in parallel onto HBase so will increase rapidly
as you scale out.

Just my 2 cents...


On Tue, Jun 23, 2009 at 3:38 PM, peterramesh<ramesh.ramasamy@gmail.com> wrote:
> Hi,
> I playing with a sample program using Map Reduce (MR).  All I have a text
> file(685 MB), and using it to create a HTable.
> The testing environment is,
> 1. single node cluster
> 2. 2 MB RAM
> 3. Hadoop and Hbase version, both are 0.19.1
> Here is the program attached,
> http://www.nabble.com/file/p24166190/MRTest.java MRTest.java
> and the hadoop-site.xml
> http://www.nabble.com/file/p24166190/hadoop-site.xml hadoop-site.xml
> and fair scheduler allocation file
> http://www.nabble.com/file/p24166190/mapred_fairseheduler_allocation_file.xml
> mapred_fairseheduler_allocation_file.xml
> (I had used the FairScheduler, since the mapred.map.tasks were not getting
> applied in the cluster instance, If I use JobQueueTaskScheduler (default),
> which always run 2 tasks at a time).
> On running the above program with the given configurations, it takes
> (13mins, 46sec and 15mins, 3sec respectively - 2 samples) to create the
> table.
> If the do the same stuffs without MR, it takes 18mins, 04sec. So, the MR
> gives me substantial gain. But, I would like to know, if there is better
> optimization to improve the performance and also am I doing the right?
> TIA,
> Ramesh
> --
> View this message in context: http://www.nabble.com/Map-Reduce-performance-tp24166190p24166190.html
> Sent from the HBase User mailing list archive at Nabble.com.

View raw message