hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cosmin Lehene <cleh...@adobe.com>
Subject Re: HBase not scaling well
Date Fri, 29 Oct 2010 08:05:28 GMT
Hi Hari, 

Could you do some realtime monitoring (htop, iptraf, iostat) and report the results? Also
you could add some timers to the map-reduce operations: measure average operations times to
figure out what's taking so long. 

On Oct 29, 2010, at 9:55 AM, Hari Shankar wrote:

> Hi,
>     We are currently doing a POC for HBase in our system. We have
> written a bulk upload job to upload our data from a text file into
> HBase. We are using a 3-node cluster, one master which also works as
> slave (running as namenode, jobtracker, HMaster, datanode,
> tasktracker, HQuorumpeer and  HRegionServer) and 2 slaves (datanode,
> tasktracker, HQuorumpeer and  HRegionServer running). The problem is
> that we are getting lower performance from distributed cluster than
> what we were getting from single-node pseudo distributed node. The
> upload is taking about 30  minutes on an individual machine, whereas
> it is taking 2 hrs on the cluster. We have replication set to 3, so
> all parts should ideally be available on all nodes, so we doubt if the
> problem is network latency. scp of files between nodes gives a speed
> of about 12 MB/s, which I believe should be good enough for this to
> function. Please correct me if I am wrong here. The nodes are all 4
> core machines with 8 GB RAM.  We are spawning 4 simultaneous map tasks
> on each node, and the job does not have any reduce phase. Any help is
> greatly appreciated.
> Thanks,
> Hari Shankar

View raw message