hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Hbase regionserver.MultiVersionConcurrencyControl Warning
Date Wed, 10 Aug 2016 17:08:43 GMT
On Wed, Aug 10, 2016 at 1:13 AM, Sterfield <sterfield@gmail.com> wrote:

> Hi,
>
> Thanks for your answer.
>
> I'm currently testing OpenTSDB + HBase, so I'm generating thousands of HTTP
> POST on OpenTSDB in order to write data points (currently up to 300k/s).
> OpenTSDB is only doing increment / append (AFAIK)
>
> How many nodes or is that 300k/s on a single machine?



> If I have understood your answer correctly, some write ops are queued, and
> some younger ops in the queue are "done" while some older are not.
>
>
What Anoop said plus, we'll see the STUCK notice when it is taking a long
time for the MVCC read point to come up to the write point of the currently
ongoing transaction. We will hold the updating thread until the readpoint
is equal or greater than the current transactions write point. We do this
to ensure a client can read its own writes. The MVCC is region wide. If
many ongoing updates, a slightly slower one may drag down other outstanding
transactions completing. The STUCK message goes away after some time? It
happens frequently? A thread dump while this is going on would be
interesting if possible or what else is going on on the server around this
time (see in logs?)


> Few additional questions :
>
>    - Is it a problem regarding the data or is it "safe" ? In other words,
>    the old data not been written yet will be dropped or they will be
> written
>    correctly, just later ?
>

No data is going to be dropped. The STUCK message is just flagging you that
a write is taking a while to complete while we wait on MVCC. You backed up
on disk or another resource or a bunch of writers have all happened to
arrive at one particular region (MVCC is by region)?


>    - If it's not a problem right now, and if the situation persists, can it
>    be a problem later ? (for example, HBase cannot keep up the pace then
>    finally drop some data)
>

HBase will not drop data. It will not be a problem later. It is a
WARN-level message saying writes are stalling some.


>    - How can I debug this and if possible, fix it ?
>
>
See above. Your writes are well distributed. Disks are healthy?



> I have to mention that OpenTSDB is using their own HBase client called
> asynchbase (https://github.com/OpenTSDB/asynchbase). I don't know if it's
> related or not, but just FYI.
>
>
This is serverside internal to the Region. Nothing to do with asynchbase
(other than asynchbase tends to run with less friction so may be writing
faster than our default client).

St.Ack



> Thanks,
>
> Guillaume
>
> 2016-08-10 8:10 GMT+02:00 Anoop John <anoop.hbase@gmail.com>:
>
> > Ya it comes with write workload.  Not like with concurrent reads.
> > Once the write is done (memstore write and WAL write), we mark that
> > MVCC operation corresponding to this as complete and wait for a global
> > read point to advance to atleast this point. (Every write op will have
> > a number corresponding to its mvcc op)
> > As the mvcc op begin, the write entries (which is having this number)
> > added to a Q.  Once write is over and we complete the mvcc op, we will
> > mark the entry corresponding to it (It is in Q now) as completed/done.
> >   So there are other entries also in the Q corresponding to other
> > parallel writes.   Now we will try to move the global read point at
> > least upto this mark.  Means we will have to make sue all the write
> > ops with lesser number than this to get completed.  Until then , there
> > is a wait.   So what is happening here is a later write op is
> > completed and that is still waiting for an older write to get
> > completed.   What sort of write ops u have?   Batch put and/or
> > increment/append?
> >
> > -Anoop-
> >
> >
> > On Tue, Aug 9, 2016 at 6:45 PM, Sterfield <sterfield@gmail.com> wrote:
> > > 2016-08-05 17:52 GMT+02:00 Bryan Beaudreault <bbeaudreault@hubspot.com
> >:
> > >
> > >> I'm also interested in an answer here. We see this from time to time
> in
> > our
> > >> production HBase clusters (non-opentsdb). It seems to be related to
> > >> contention under heavy reads or heavy writes. But it's not clear what
> > the
> > >> impact is here.
> > >>
> > >> On Fri, Aug 5, 2016 at 5:14 AM Sterfield <sterfield@gmail.com> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I'm currently testing Hbase 1.2.1 + OpenTSDB. For that, I'm
> > generating a
> > >> > high load of HTTP PUT on OpenTSDB, which then writes in Hbase.
> > Currently,
> > >> > I'm able to feed 300k data point per seconds, and I'm trying to
> > achieve
> > >> > higher speed.
> > >> >
> > >> > I have also activate JMX on both Master and Region servers in order
> to
> > >> grab
> > >> > a bunch of metrics from Hbase.
> > >> >
> > >> > At some point, I'm noticing a bunch of warnings in a region server
> log
> > >> file
> > >> > :
> > >> >
> > >> > 2016-08-04 05:12:06,537 WARN
> > >> >  [B.defaultRpcServer.handler=21,queue=3,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714778}
> > >> > 2016-08-04 05:12:06,538 WARN
> > >> >  [B.defaultRpcServer.handler=19,queue=1,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714778}
> > >> > 2016-08-04 05:12:06,538 WARN
> > >> >  [B.defaultRpcServer.handler=40,queue=4,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714778}
> > >> > 2016-08-04 05:12:06,548 WARN
> > >> >  [B.defaultRpcServer.handler=0,queue=0,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714778}
> > >> > 2016-08-04 05:12:06,549 WARN
> > >> >  [B.defaultRpcServer.handler=35,queue=5,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714778}
> > >> > 2016-08-04 05:12:06,556 WARN
> > >> >  [B.defaultRpcServer.handler=7,queue=1,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> > 2016-08-04 05:12:06,558 WARN
> > >> >  [B.defaultRpcServer.handler=20,queue=2,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> > 2016-08-04 05:12:06,559 WARN
> > >> >  [B.defaultRpcServer.handler=14,queue=2,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> > 2016-08-04 05:12:06,609 WARN
> > >> >  [B.defaultRpcServer.handler=42,queue=0,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> > 2016-08-04 05:12:06,609 WARN
> > >> >  [B.defaultRpcServer.handler=32,queue=2,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> > 2016-08-04 05:12:06,619 WARN
> > >> >  [B.defaultRpcServer.handler=2,queue=2,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> > 2016-08-04 05:12:06,619 WARN
> > >> >  [B.defaultRpcServer.handler=48,queue=0,port=16020]
> > >> > regionserver.MultiVersionConcurrencyControl: STUCK:
> > >> > MultiVersionConcurrencyControl{readPoint=7714719,
> writePoint=7714779}
> > >> >
> > >> > Can someone shed some lights on this ? I've googling + searching the
> > >> > documentation, but found no clue on this subject.
> > >> >
> > >> > Is it a problem ? If so, how can I fix it ?
> > >> >
> > >> > Thanks a lot for your help,
> > >> >
> > >> > Guillaume
> > >> >
> > >>
> > >
> > > Hello,
> > >
> > > Sorry to bump this thread, but no one knows anything about this issue ?
> > >
> > > Thanks,
> > >
> > > Guillaume
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message