hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Scanner timeouts
Date Fri, 28 Oct 2016 21:58:25 GMT
That's another way of using hbase.

Watch out for PHOENIX-3333
<https://issues.apache.org/jira/browse/PHOENIX-3333> if you're running
queries with Spark 2.0

On Fri, Oct 28, 2016 at 2:38 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hbase does not have indexes but Phoenix will allow one to create secondary
> indexes on Hbase. The index structure will be created on Hbase itself and
> you can maintain it from Phoenix.
>
> HTH
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=
> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> OABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 28 October 2016 at 19:29, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > bq. with 400 threads hitting HBase at the same time
> >
> > How many regions are serving the 400 threads ?
> > How many region servers do you have ?
> >
> > If the regions are spread relatively evenly across the cluster, the above
> > may not be big issue.
> >
> > On Fri, Oct 28, 2016 at 11:21 AM, Pat Ferrel <pat@occamsmachete.com>
> > wrote:
> >
> > > Ok, will do.
> > >
> > > So the scanner does not indicate of itself that I’ve missed something
> in
> > > handling the data. If not index, then made a fast lookup “key”? I ask
> > > because the timeout change may work but not be the optimal solution.
> The
> > > stage that fails is very long compared to other stages. And with 400
> > > threads hitting HBase at the same time, this seems like something I may
> > > need to restructure and any advice about that would be welcome.
> > >
> > > HBase is 1.2.3
> > >
> > >
> > > On Oct 28, 2016, at 10:36 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > For your first question, you need to pass hbase-site.xml which has
> config
> > > parameters affecting client operations to Spark  executors.
> > >
> > > bq. missed indexing some column
> > >
> > > hbase doesn't have indexing (as in the sense of traditional RDBMS).
> > >
> > > Let's see what happens after hbase-site.xml is passed to executors.
> > >
> > > BTW Can you tell us the release of hbase you're using ?
> > >
> > >
> > >
> > > On Fri, Oct 28, 2016 at 10:22 AM, Pat Ferrel <pat@occamsmachete.com>
> > > wrote:
> > >
> > > > So to clarify there are some values in hbase/conf/hbase-site.xml that
> > are
> > > > needed by the calling code in the Spark driver and executors and so
> > must
> > > be
> > > > passed using --files to spark-submit? If so I can do this.
> > > >
> > > > But do I have a deeper issue? Is it typical to need a scan like this?
> > > Have
> > > > I missed indexing some column maybe?
> > > >
> > > >
> > > > On Oct 28, 2016, at 9:59 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > >
> > > > Mich:
> > > > bq. on table 'hbase:meta' *at region=hbase:meta,,1.1588230740
> > > >
> > > > What you observed was different issue.
> > > > The above looks like trouble with locating region(s) during scan.
> > > >
> > > > On Fri, Oct 28, 2016 at 9:54 AM, Mich Talebzadeh <
> > > > mich.talebzadeh@gmail.com>
> > > > wrote:
> > > >
> > > >> This is an example I got
> > > >>
> > > >> warning: there were two deprecation warnings; re-run with
> -deprecation
> > > > for
> > > >> details
> > > >> rdd1: org.apache.spark.rdd.RDD[(String, String)] =
> > MapPartitionsRDD[77]
> > > > at
> > > >> map at <console>:151
> > > >> defined class columns
> > > >> dfTICKER: org.apache.spark.sql.Dataset[columns] = [KEY: string,
> > TICKER:
> > > >> string]
> > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed
> > after
> > > >> attempts=36, exceptions:
> > > >> *Fri Oct 28 13:13:46 BST 2016, null, java.net.
> SocketTimeoutException:
> > > >> callTimeout=60000, callDuration=68411: row
> > > >> 'MARKETDATAHBASE,,00000000000000' on table 'hbase:meta' *at
> > > >> region=hbase:meta,,1.1588230740, hostname=rhes564,16201,
> > 1477246132044,
> > > >> seqNum=0
> > > >> at
> > > >> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadRepli
> > > >> cas.throwEnrichedException(RpcRetryingCallerWithReadRepli
> > cas.java:276)
> > > >> at
> > > >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> > > >> ScannerCallableWithReplicas.java:210)
> > > >> at
> > > >> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(
> > > >> ScannerCallableWithReplicas.java:60)
> > > >> at
> > > >> org.apache.hadoop.hbase.client.RpcRetryingCaller.
> callWithoutRetries(
> > > >> RpcRetryingCaller.java:210)
> > > >>
> > > >>
> > > >>
> > > >> Dr Mich Talebzadeh
> > > >>
> > > >>
> > > >>
> > > >> LinkedIn * https://www.linkedin.com/profile/view?id=
> > > >> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >> <https://www.linkedin.com/profile/view?id=
> > > AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > > >> OABUrV8Pw>*
> > > >>
> > > >>
> > > >>
> > > >> http://talebzadehmich.wordpress.com
> > > >>
> > > >>
> > > >> *Disclaimer:* Use it at your own risk. Any and all responsibility
> for
> > > any
> > > >> loss, damage or destruction of data or any other property which may
> > > arise
> > > >> from relying on this email's technical content is explicitly
> > disclaimed.
> > > >> The author will in no case be liable for any monetary damages
> arising
> > > > from
> > > >> such loss, damage or destruction.
> > > >>
> > > >>
> > > >>
> > > >> On 28 October 2016 at 17:52, Pat Ferrel <pat@occamsmachete.com>
> > wrote:
> > > >>
> > > >>> I will check that, but if that is a server startup thing I was
not
> > > aware
> > > >> I
> > > >>> had to send it to the executors. So it’s like a connection timeout
> > from
> > > >>> executor code?
> > > >>>
> > > >>>
> > > >>> On Oct 28, 2016, at 9:48 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
> > > >>>
> > > >>> How did you change the timeout(s) ?
> > > >>>
> > > >>> bq. timeout is currently set to 60000
> > > >>>
> > > >>> Did you pass hbase-site.xml using --files to Spark job ?
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>> On Fri, Oct 28, 2016 at 9:27 AM, Pat Ferrel <pat@occamsmachete.com
> >
> > > >> wrote:
> > > >>>
> > > >>>> Using standalone Spark. I don’t recall seeing connection
lost
> > errors,
> > > >> but
> > > >>>> there are lots of logs. I’ve set the scanner and RPC timeouts
to
> > large
> > > >>>> numbers on the servers.
> > > >>>>
> > > >>>> But I also saw in the logs:
> > > >>>>
> > > >>>>  org.apache.hadoop.hbase.client.ScannerTimeoutException: 381788ms
> > > >>>> passed since the last invocation, timeout is currently set
to
> 60000
> > > >>>>
> > > >>>> Not sure where that is coming from. Does the driver machine
making
> > > >>> queries
> > > >>>> need to have the timeout config also?
> > > >>>>
> > > >>>> And why so large, am I doing something wrong?
> > > >>>>
> > > >>>>
> > > >>>> On Oct 28, 2016, at 8:50 AM, Ted Yu <yuzhihong@gmail.com>
wrote:
> > > >>>>
> > > >>>> Mich:
> > > >>>> The OutOfOrderScannerNextException indicated problem with
read
> from
> > > >>> hbase.
> > > >>>>
> > > >>>> How did you know connection to Spark cluster was lost ?
> > > >>>>
> > > >>>> Cheers
> > > >>>>
> > > >>>> On Fri, Oct 28, 2016 at 8:47 AM, Mich Talebzadeh <
> > > >>>> mich.talebzadeh@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> Looks like it lost the connection to Spark cluster.
> > > >>>>>
> > > >>>>> What mode you are using with Spark, Standalone, Yarn or
others.
> The
> > > >>> issue
> > > >>>>> looks like a resource manager issue.
> > > >>>>>
> > > >>>>> I have seen this when running Zeppelin with Spark on Hbase.
> > > >>>>>
> > > >>>>> HTH
> > > >>>>>
> > > >>>>> Dr Mich Talebzadeh
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> LinkedIn * https://www.linkedin.com/profile/view?id=
> > > >>>>> AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> > > >>>>> <https://www.linkedin.com/profile/view?id=
> > > >>> AAEAAAAWh2gBxianrbJd6zP6AcPCCd
> > > >>>>> OABUrV8Pw>*
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> http://talebzadehmich.wordpress.com
> > > >>>>>
> > > >>>>>
> > > >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
> > for
> > > >>> any
> > > >>>>> loss, damage or destruction of data or any other property
which
> may
> > > >>> arise
> > > >>>>> from relying on this email's technical content is explicitly
> > > >> disclaimed.
> > > >>>>> The author will in no case be liable for any monetary
damages
> > arising
> > > >>>> from
> > > >>>>> such loss, damage or destruction.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On 28 October 2016 at 16:38, Pat Ferrel <pat@occamsmachete.com>
> > > >> wrote:
> > > >>>>>
> > > >>>>>> I’m getting data from HBase using a large Spark
cluster with
> > > >>> parallelism
> > > >>>>>> of near 400. The query fails quire often with the
message below.
> > > >>>>> Sometimes
> > > >>>>>> a retry will work and sometimes the ultimate failure
results
> > > (below).
> > > >>>>>>
> > > >>>>>> If I reduce parallelism in Spark it slows other parts
of the
> > > >> algorithm
> > > >>>>>> unacceptably. I have also experimented with very large
> RPC/Scanner
> > > >>>>> timeouts
> > > >>>>>> of many minutes—to no avail.
> > > >>>>>>
> > > >>>>>> Any clues about what to look for or what may be setup
wrong in
> my
> > > >>>> tables?
> > > >>>>>>
> > > >>>>>> Job aborted due to stage failure: Task 44 in stage
147.0 failed
> 4
> > > >>> times,
> > > >>>>>> most recent failure: Lost task 44.3 in stage 147.0
(TID 24833,
> > > >>>>>> ip-172-16-3-9.eu-central-1.compute.internal):
> > > >> org.apache.hadoop.hbase.
> > > >>>>> DoNotRetryIOException:
> > > >>>>>> Failed after retry of OutOfOrderScannerNextException:
was
> there a
> > > >> rpc
> > > >>>>>> timeout?+details
> > > >>>>>> Job aborted due to stage failure: Task 44 in stage
147.0 failed
> 4
> > > >>> times,
> > > >>>>>> most recent failure: Lost task 44.3 in stage 147.0
(TID 24833,
> > > >>>>>> ip-172-16-3-9.eu-central-1.compute.internal):
> > > >> org.apache.hadoop.hbase.
> > > >>>>> DoNotRetryIOException:
> > > >>>>>> Failed after retry of OutOfOrderScannerNextException:
was
> there a
> > > >> rpc
> > > >>>>>> timeout? at org.apache.hadoop.hbase.client.ClientScanner.next(
> > > >>>>> ClientScanner.java:403)
> > > >>>>>> at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.
> > > >>>> nextKeyValue(
> > > >>>>>> TableRecordReaderImpl.java:232) at org.apache.hadoop.hbase.
> > > >>>>>> mapreduce.TableRecordReader.nextKeyValue(
> > > TableRecordReader.java:138)
> > > >>> at
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>>
> > > >>
> > > >
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message