hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Scanner timeouts
Date Fri, 28 Oct 2016 16:59:24 GMT
Mich:
bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740

What you observed was different issue.
The above looks like trouble with locating region(s) during scan.

On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> This is an example I got
>
> warning: there were two deprecation warnings; re-run with -deprecation for
> details
> rdd1: org.apache.spark.rdd.RDD[(String, String)] = MapPartitionsRDD[77] at
> map at <console>:151
> defined class columns
> dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string, TICKER:
> string]
> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after
> attempts=36, exceptions:
> *Fri Oct 28 13:13:46 BST 2016, null, java.net.SocketTimeoutException:
> callTimeout=60000, callDuration=68411: row
> 'MARKETDATAHBASE,,00000000000000' on table 'hbase:meta' *at
> region=hbase:meta,,1.1588230740, hostname=rhes564,16201,1477246132044,
> seqNum=0
>   at
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> cas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:276)
>   at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:210)
>   at
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> ScannerCallableWithReplicas.java:60)
>   at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> RpcRetryingCaller.java:210)
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 October 2016 at 17:52, Pat Ferrel <pat@occamsmachete.com> wrote:
>
> > I will check that, but if that is a server startup thing I was not aware
> I
> > had to send it to the executors. So it’s like a connection timeout from
> > executor code?
> >
> >
> > On Oct 28, 2016, at 9:48 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > How did you change the timeout(s) ?
> >
> > bq. timeout is currently set to 60000
> >
> > Did you pass hbase-site.xml using --files to Spark job ?
> >
> > Cheers
> >
> > On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <pat@occamsmachete.com>
> wrote:
> >
> > > Using standalone Spark. I don’t recall seeing connection lost errors,
> but
> > > there are lots of logs. I’ve set the scanner and RPC timeouts to large
> > > numbers on the servers.
> > >
> > > But I also saw in the logs:
> > >
> > >    org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
> > > passed since the last invocation, timeout is currently set to 60000
> > >
> > > Not sure where that is coming from. Does the driver machine making
> > queries
> > > need to have the timeout config also?
> > >
> > > And why so large, am I doing something wrong?
> > >
> > >
> > > On Oct 28, 2016, at 8:50 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > Mich:
> > > The OutOfOrderScannerNextException indicated problem with read from
> > hbase.
> > >
> > > How did you know connection to Spark cluster was lost ?
> > >
> > > Cheers
> > >
> > > On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <
> > > mich.talebzadeh@gmail.com>
> > > wrote:
> > >
> > >> Looks like it lost the connection to Spark cluster.
> > >>
> > >> What mode you are using with Spark, Standalone, Yarn or others. The
> > issue
> > >> looks like a resource manager issue.
> > >>
> > >> I have seen this when running Zeppelin with Spark on Hbase.
> > >>
> > >> HTH
> > >>
> > >> Dr Mich Talebzadeh
> > >>
> > >>
> > >>
> > >> LinkedIn * https://www.linkedin.com/profile/view?id=
> > >> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > >> <https://www.linkedin.com/profile/view?id=
> > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > >> OABUrV8Pw>*
> > >>
> > >>
> > >>
> > >> http://talebzadehmich.wordpress.com
> > >>
> > >>
> > >> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> > any
> > >> loss, damage or destruction of data or any other property which may
> > arise
> > >> from relying on this email's technical content is explicitly
> disclaimed.
> > >> The author will in no case be liable for any monetary damages arising
> > > from
> > >> such loss, damage or destruction.
> > >>
> > >>
> > >>
> > >> On 28 October 2016 at 16:38, Pat Ferrel <pat@occamsmachete.com>
> wrote:
> > >>
> > >>> I’m getting data from HBase using a large Spark cluster with
> > parallelism
> > >>> of near 400. The query fails quire often with the message below.
> > >> Sometimes
> > >>> a retry will work and sometimes the ultimate failure results (below).
> > >>>
> > >>> If I reduce parallelism in Spark it slows other parts of the
> algorithm
> > >>> unacceptably. I have also experimented with very large RPC/Scanner
> > >> timeouts
> > >>> of many minutes—to no avail.
> > >>>
> > >>> Any clues about what to look for or what may be setup wrong in my
> > > tables?
> > >>>
> > >>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4
> > times,
> > >>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> > >>> ip-172-16-3-9.eu-central-1.compute.internal):
> org.apache.hadoop.hbase.
> > >> DoNotRetryIOException:
> > >>> Failed after retry of OutOfOrderScannerNextException: was there a
> rpc
> > >>> timeout?+details
> > >>> Job aborted due to stage failure: Task 44 in stage 147.0 failed 4
> > times,
> > >>> most recent failure: Lost task 44.3 in stage 147.0 (TID 24833,
> > >>> ip-172-16-3-9.eu-central-1.compute.internal):
> org.apache.hadoop.hbase.
> > >> DoNotRetryIOException:
> > >>> Failed after retry of OutOfOrderScannerNextException: was there a
> rpc
> > >>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(
> > >> ClientScanner.java:403)
> > >>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.
> > > nextKeyValue(
> > >>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
> > >>> mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
> > at
> > >>>
> > >>
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message