hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Scanner timeouts
Date Fri, 28 Oct 2016 15:50:03 GMT
Mich:
The OutOfOrderScannerNextException indicated problem with read from hbase.

How did you know connection to Spark cluster was lost ?

Cheers

On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Looks like it lost the connection to Spark cluster.
>
> What mode you are using with Spark, Standalone, Yarn or others. The issue
> looks like a resource manager issue.
>
> I have seen this when running Zeppelin with Spark on Hbase.
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 October 2016 at 16:38, Pat Ferrel <pat@occamsmachete.com> wrote:
>
> > I’m getting data from HBase using a large Spark cluster with parallelism
> > of near 400. The query fails quire often with the message below.
> Sometimes
> > a retry will work and sometimes the ultimate failure results (below).
> >
> > If I reduce parallelism in Spark it slows other parts of the algorithm
> > unacceptably. I have also experimented with very large RPC/Scanner
> timeouts
> > of many minutes—to no avail.
> >
> > Any clues about what to look for or what may be setup wrong in my tables?
> >
> > Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> > most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> > ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.
> DoNotRetryIOException:
> > Failed after retry of OutOfOrderScannerNextException: was there a rpc
> > timeout?+details
> > Job aborted due to stage failure: Task 44 in stage 147.0 failed 4 times,
> > most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> > ip-172-16-3-9.eu-central-1.compute.internal): org.apache.hadoop.hbase.
> DoNotRetryIOException:
> > Failed after retry of OutOfOrderScannerNextException: was there a rpc
> > timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(
> ClientScanner.java:403)
> > at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(
> > TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
> > mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138) at
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message