hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: hbase cluster working bad
Date Tue, 22 Jul 2014 12:42:46 GMT
-user@hbase (bcc), +cdh-user

Hello Pavel,

I'm moving your question to the cdh-user@cloudera.org mailing list since
its more related to specific Hadoop distribution. However from the symptoms
it looks like there is some contention (probably in HDFS or something else)
that is causing the Region Servers to become unresponsive and rippling to
the map tasks when the task tries to fetch data from HBase.

Regards,
Esteban.


--
Cloudera, Inc.



On Tue, Jul 22, 2014 at 4:46 AM, Павел Мезенцев <pavel@mezentsev.org> wrote:

> Jobs, running on this cluster, print exceptions:
>
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020 failed on socket
> timeout exception: java.net.SocketTimeoutException: 60000 millis timeout
> while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.218.64.14:38621
> remote=
> ds-hadoop-wk01p.tcsbank.ru/10.218.64.11:60020]
>
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1569)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1421)
>         at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:739)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:708)
>         at
> org.apache.hadoop.hbase.client.HTablePool$PooledHTable.get(HTablePool.java:367)
>         at
> ru.tcsbank.hbase.HBasePersonDao.getUsersBatch(HBasePersonDao.java:306)
>         at
> ru.tcsbank.matching.PersonMatcher.performSolrRequest(PersonMatcher.java:153)
>         at ru.tcsbank.matching.PersonMatcher.search(PersonMatcher.java:135)
>         at
> ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:80)
>         at
> ru.tcsbank.personmatcher.mr.PersonMatcherJob$MapClass.map(PersonMatcherJob.java:65)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
>         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
>
>
>
> С уважением,
> Мезенцев Павел
>
>
> 2014-07-22 14:59 GMT+04:00 Павел Мезенцев <pavel@mezentsev.org>:
>
> > Hello all!
> >
> > We have a trouble with hbase
> > Our hadoop cluster has 4 nodes (plus 1 client node).
> > There are CHD 4.6 + CM 4.7 hadoop installed
> > Hadoop versions are:
> >  - hadoop-hdfs : 2.0.0+1475
> >  - hadoop-0.20-mapreduce : 2.0.0+1475
> >  - hbase" : 0.94.6+132
> > Hadoop and hBase configs are in attachment
> >
> > We have several tables in hbase with total volume of 2 Tb.
> > We run mapReduce ETL jobs and analytics queries over them.
> >
> > There are a lot of warnings like
> > - *The health test result for REGION_SERVER_READ_LATENCY has become bad:
> > The moving average of HDFS read latency is 162 millisecond(s) over the
> > previous 5 minute(s). Critical threshold: 100*.
> > - *The health test result for REGION_SERVER_SYNC_LATENCY has become bad:
> > The moving average of HDFS sync latency is 8.2 second(s) over the
> previous
> > 5 minute(s). Critical threshold: 5,000*.
> > *- HBase region health: 442 unhealthy regions *
> > *- HDFS_DATA_NODES_HEALTHY has become bad*
> > *- HBase Region Health Canary is running slowly **on the cluster*
> >
> > mapReduce jobs over hBase with random queries to hBase working very
> slowly
> > (job is completed on 20% after 18 hours versus 100% after 12 hours on
> > analogue cluster)
> >
> > Please help use to solve reasons of this alerts and speed up the cluster.
> > Could you give us a good advise, what shall we do?
> >
> > Cheers,
> > Mezentsev Pavel
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message