hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dhaval Shah <prince_mithi...@yahoo.co.in>
Subject Re: hbase cluster working bad
Date Tue, 22 Jul 2014 20:38:31 GMT
We just solved a very similar issue with our cluster (yesterday!). I would suggest you look
at 2 things in particular:
- Is the network on your region server saturated? That would prevent connections from being
- See if the region server has any RPC handlers available when you get this error. Its possible
that all RPC handlers are busy servicing other requests (or stuck due to a combination of
load and bad configs).


 From: Павел Мезенцев <pavel@mezentsev.org>
To: user@hbase.apache.org 
Sent: Tuesday, 22 July 2014 7:46 AM
Subject: Re: hbase cluster working bad

Jobs, running on this cluster, print exceptions:

java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to ds-hadoop-wk01p.tcsbank.ru/ failed on socket
timeout exception: java.net.SocketTimeoutException: 60000 millis timeout
while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/ remote=

    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:188)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1569)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1421)
    at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
    at org.apache.hadoop.hbase.client.HTable.get(HTable.java:708)
    at org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367)
    at ru.tcsbank.hbase.HBasePersonDao.getUsersBatch(HBasePersonDao.java:306)
    at ru.tcsbank.matching.PersonMatcher.performSolrRequest(PersonMatcher.java:153)
    at ru.tcsbank.matching.PersonMatcher.search(PersonMatcher.java:135)
    at ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:80)
    at ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:65)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

С уважением,
Мезенцев Павел

2014-07-22 14:59 GMT+04:00 Павел Мезенцев <pavel@mezentsev.org>:

> Hello all!
> We have a trouble with hbase
> Our hadoop cluster has 4 nodes (plus 1 client node).
> There are CHD 4.6 + CM 4.7 hadoop installed
> Hadoop versions are:
>  - hadoop-hdfs : 2.0.0+1475
>  - hadoop-0.20-mapreduce : 2.0.0+1475
>  - hbase" : 0.94.6+132
> Hadoop and hBase configs are in attachment
> We have several tables in hbase with total volume of 2 Tb.
> We run mapReduce ETL jobs and analytics queries over them.
> There are a lot of warnings like
> - *The health test result for REGION_SERVER_READ_LATENCY has become bad:
> The moving average of HDFS read latency is 162 millisecond(s) over the
> previous 5 minute(s). Critical threshold: 100*.
> - *The health test result for REGION_SERVER_SYNC_LATENCY has become bad:
> The moving average of HDFS sync latency is 8.2 second(s) over the previous
> 5 minute(s). Critical threshold: 5,000*.
> *- HBase region health: 442 unhealthy regions *
> *- HDFS_DATA_NODES_HEALTHY has become bad*
> *- HBase Region Health Canary is running slowly **on the cluster*
> mapReduce jobs over hBase with random queries to hBase working very slowly
> (job is completed on 20% after 18 hours versus 100% after 12 hours on
> analogue cluster)
> Please help use to solve reasons of this alerts and speed up the cluster.
> Could you give us a good advise, what shall we do?
> Cheers,
> Mezentsev Pavel
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message