hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Scanner timeouts
Date Fri, 28 Oct 2016 15:47:53 GMT
Looks like it lost the connection to Spark cluster.

What mode you are using with Spark, Standalone, Yarn or others. The issue
looks like a resource manager issue.

I have seen this when running Zeppelin with Spark on Hbase.

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 28 October 2016 at 16:38, Pat Ferrel <pat@occamsmachete.com> wrote:

> I’m getting data from HBase using a large Spark cluster with parallelism
> of near 400. The query fails quire often with the message below. Sometimes
> a retry will work and sometimes the ultimate failure results (below).
>
> If I reduce parallelism in Spark it slows other parts of the algorithm
> unacceptably. I have also experimented with very large RPC/Scanner timeouts
> of many minutes—to no avail.
>
> Any clues about what to look for or what may be setup wrong in my tables?
>
> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.DoNotRetryIOException:
> Failed after retry of OutOfOrderScannerNextException: was there a rpc
> timeout?+details
> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.DoNotRetryIOException:
> Failed after retry of OutOfOrderScannerNextException: was there a rpc
> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:403)
> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(
> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message