Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@accumulo.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAD=3M0zCxCq-qgMMBiHeZ2-sy-o91bfFKshQxVbWNGf9-GGYnw@mail.gmail.com>
References: 
 <CAPx=JkaY7fVh-U0O+ysx2d98LOGMcA4oEQOYgoPxR-0em4hdvg@mail.gmail.com>
 <CAPx=Jkbjo-HDtJ7Q4c5LZeeiXJ10zA0hWU2jHU_L8hm5o7OUGA@mail.gmail.com>
 <5716B0A5.2020102@gmail.com>
 <CALs3O6i7haNqsYkRwiTUgn9zsL2WATvOT-jOS-JQYJGmdG0FdA@mail.gmail.com>
 <CAPx=JkZyEkXdbbNeF8uotswZTMdd3cZtDgoEm944_Bw1_MYKqg@mail.gmail.com>
 <CAD=3M0zCxCq-qgMMBiHeZ2-sy-o91bfFKshQxVbWNGf9-GGYnw@mail.gmail.com>
From: Dylan Hutchison <dhutchis@cs.washington.edu>
Date: Fri, 22 Apr 2016 13:16:22 -0700
Message-ID: 
 <CAPx=JkZw8uXgXZzphm+RbP=d-VSZAxz8dnjqdJqgJFNQHG0jMg@mail.gmail.com>
Subject: Re: Checking what a BatchWriter is stuck on; failure during split
To: Accumulo Dev List <dev@accumulo.apache.org>
Content-Type: multipart/alternative; boundary=089e01229ed452fb2d0531188176

--089e01229ed452fb2d0531188176
Content-Type: text/plain; charset=UTF-8

You caught me Shawn =)  In my case, the tserver is also the client.  I do
evil things like open BatchWriters inside iterators.

So if the cache is indeed the problem, then it would only affect clients
that live within the same JVM as the tserver.  This is good news because
not every client would see this problem.  I'm not sure if this covers
William's MapReduce case.

I still need to write a test case that reproduces this problem to be
totally sure.

Cheers, Dylan


On Fri, Apr 22, 2016 at 9:31 AM, Shawn Walker <accumulo@shawn-walker.net>
wrote:

> Dylan,
>
> You pastebin'd a jstack of the tserver, though from the little I can see,
> I'd think the issue was on the client.  Do you have a jstack of what it
> looks like?
>
> You say that "every hour the static collection of TabletLocators clears
> itself", but my read is that this statement only applies to the client used
> by the tserver.
>
> --
> Shawn Walker
>
> On Fri, Apr 22, 2016 at 12:19 AM, Dylan Hutchison <
> dhutchis@cs.washington.edu> wrote:
>
> > Hi all,
> >
> > Check out ACCUMULO-4229
> > <https://issues.apache.org/jira/browse/ACCUMULO-4229>.  I copy the text
> > below to include it in our discussion.
> >
> > This issue came up again during a similar long-lasting BatchWrite.  It
> had
> > the same circumstances: the failures started happening after a split
> event.
> >
> > I turned on TRACE logs and I think I pinned it down: the TabletLocator
> > cached by a BatchWriter gets out of sync with the static cache of
> > TabletLocators.
> >
> >    1. The TabletServerBatchWriter caches a TabletLocator from the static
> >    collection of TabletLocators when it starts writing.  Suppose it is
> > writing
> >    to tablet T1.
> >    2. The TabletServerBatchWriter uses its locally cached TabletLocator
> >    inside its `binMutations` method for its entire lifespan; this cache
> is
> >    never refreshed or updated to sync up with the static collection of
> >    TabletLocators.
> >    3. Every hour, the static collection of TabletLocators clears itself.
> >    The next call to get a TabletLocator from the static collection
> > allocates a
> >    new TabletLocator.  Unfortunately, the TabletServerBatchWriter does
> not
> >    reflect this change and continues to use the old, locally cached
> >    TabletLocator.
> >    4. Tablet T1 splits into T2 and T3, which closes T1.  As such, it no
> >    longer exists and the tablet server that receives the entries meant to
> > go
> >    to T1 all fail to write because T1 is closed.
> >    5. The TabletServerBatchWriter receives the response from the tablet
> >    server that all entries failed to write.  It invalidates the cache of
> > the
> >    *new* TabletLocator obtained from the static collection of
> > TabletLocators.
> >    The old TabletLocator that is cached locally does not get invalidated.
> >    6. The TabletServerBatchWriter re-queues the failed entries and tries
> to
> >    write them to the same closed tablet T1, because it is still looking
> up
> >    tablets using the old TabletLocator.
> >
> > This behavior subsumes the circumstances William wrote about in the
> thread
> > <
> >
> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3CCAMz+DuvmmHegOn9EJeHR9H_rRpP50L2QZ53BbdruVO0pirArQw@mail.gmail.com%3E
> > >
> > he mentioned.  The problem would occur as a result of either splits or
> > major compactions.  It would only stop the BatchWriter if its entire
> memory
> > filled up with writes to the same tablet that was closed as a result of a
> > majc or split; otherwise it would just slow down the BatchWriter by
> failing
> > to write some number of entries with every RPC.
> >
> > There are a few solutions we can think of.
> >
> >    1. Not have the MutationWriter inside the TabletServerBatchWriter
> >    locally cache TabletLocators.  I suspect this was done for performance
> >    reasons, so it's probably not a good solution.
> >    2. Have all the MutationWriters clear their cache at the same time the
> >    static TabletLocator cache clears.  I like this one.  We could store a
> >    reference to the Map that each MutationWriter has inside a static
> >    synchronized WeakHashMap.  The only time the weak map needs to be
> > accessed
> >    is:
> >       1. When a MutationWriter is constructed (from constructing a
> >       TabletServerBatchWriter), add its new local TabletLocator cache
> > to the weak
> >       map.
> >       2. When the static TabletLocator cache is cleared, also clear every
> >       map in the weak map.
> >    3. Another solution is to make the invalidate calls on the local
> >    TabletLocator cache rather than the global static one.  If we go this
> > route
> >    we should double check the idea to make sure it does not impact the
> >    correctness of any other pieces of code that use the cache. I like the
> >    previous idea better.
> >
> > The TimeoutTabletLocator does not help when no timeout is set on the
> > BatchWriter (the default behavior).
> >
> >
> > On Tue, Apr 19, 2016 at 8:09 PM, William Slacum <wslacum@gmail.com>
> wrote:
> >
> > > Good digs, Dylan. I don't think it's too rare to matter. I notice
> often
> > > during MR jobs, and there's usually a point where I give up and just
> > start
> > > writing RFiles.
> > >
> > > It could possibly be related to what I saw back in the dayoday with:
> > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3CCAMz+DuvmmHegOn9EJeHR9H_rRpP50L2QZ53BbdruVO0pirArQw@mail.gmail.com%3E
> > >
> > > On Tue, Apr 19, 2016 at 6:26 PM, Josh Elser <josh.elser@gmail.com>
> > wrote:
> > >
> > > > Nice findings. Sorry I haven't had any cycles to dig into this
> myself.
> > > >
> > > > I look forward to hearing what you find :)
> > > >
> > > >
> > > > Dylan Hutchison wrote:
> > > >
> > > >> I investigated a bit more and I am pretty sure the problem is that
> the
> > > >> BatchWriter is not recognizing that the tablet vb<<  split into
> > vb;2436<
> > > >> and
> > > >> vb<;2436.  It keeps trying to update the closed tablet vb<<.  Each
> > > update
> > > >> writes 0 mutations and records a failure at the tablet server
> > > >> UpdateSession
> > > >> because vb<<  is closed.
> > > >>
> > > >> I'm not sure why this is happening because the BatchWriter should
> have
> > > >> invalidated its tablet locator cache upon recognizing a failure.
> Then
> > > it
> > > >> would recognize that the entries it wants to write fall into the new
> > > >> tablets vb;2436<  and vb<;2436.  I think there is a timing bug for
> > this
> > > >> edge
> > > >> case, when a table split occurs during heavy writes.
> > > >>
> > > >> I will write this up if I can reproduce it.  Maybe it is too rare to
> > > >> matter.
> > > >>
> > > >> Cheers, Dylan
> > > >>
> > > >> On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison<
> > > >> dhutchis@cs.washington.edu
> > > >>
> > > >>> wrote:
> > > >>>
> > > >>
> > > >> Hi devs,
> > > >>>
> > > >>> I'd like to ask your help in figuring out what is happening to a
> > > >>> BatchWriter.  The following gives my reasoning so far.
> > > >>>
> > > >>> In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING
> > status
> > > >>> in
> > > >>> its addMutation method.  I saw that it is stuck by jstack'ing the
> > > >>> Accumulo
> > > >>> client.  It's been stuck like this for 16 hours.
> > > >>>
> > > >>> The BatchWriter is supposed to wait when a mutation is added if no
> > > >>> failures have recorded and either (a) the total memory used exceeds
> > the
> > > >>> maximum allowed for the BatchWriter, or (b) the batchwriter is
> > > currently
> > > >>> flushed.  So we conclude that one of (a) or (b) have occurred and
> no
> > > >>> failures were recorded, at the time when addMutation was called.  I
> > > think
> > > >>> (a) is likely.
> > > >>>
> > > >>> The BatchWriter is supposed to notify itself when either (1) a
> flush
> > > >>> finishes, (2) a constraint violation or authorization failure or
> > server
> > > >>> error or unknown error occurs, (3) memory usage decreases, which
> > > happens
> > > >>> when entries successfully send to the tablet server.  Since the
> > > >>> BatchWriter
> > > >>> is stuck on WAITING, none of these conditions are occurring.
> > > >>>
> > > >>> The BatchWriter has 3 write threads (the default number).  All
> three
> > > have
> > > >>> status TIMED_WAITING (parked) in jstack.  Their stack traces don't
> > give
> > > >>> useful information.
> > > >>>
> > > >>> Here's what I can tell from the tserver logs.  A new table (and
> > tablet)
> > > >>> was created successfully.  The BatchWriter started writing to this
> > > tablet
> > > >>> steadily.  The logs show that the tablet (vb<<) flushed every 5
> > seconds
> > > >>> or
> > > >>> so and major compacted at a steady periodic rate.
> > > >>>
> > > >>> Everything looks good, until vb<<  grew large enough that it needed
> > > >>> splitting.  This occurred about 42 minutes after the BatchWriter
> > > started
> > > >>> writing entries.  The logs show a failure in an UpdateSession that
> > > popped
> > > >>> up in the middle of the split operation.  This failure continues to
> > > show
> > > >>> for the next 15 hours.
> > > >>>
> > > >>> I copied the portion of the tserver logs that look relevant to the
> > > split
> > > >>> below.  I highlighted the line reporting the first failure.  It
> > occurs
> > > in
> > > >>> between when the split starts and when it finishes.
> > > >>>
> > > >>> Any idea what could have caused this?  I don't know if the failure
> is
> > > >>> related to the BatchWriter being stuck in WAITING.  It seems
> > likely.  I
> > > >>> think it is weird that the 3 write threads are all idle; at least
> one
> > > of
> > > >>> them should be doing something if the thread calling addMutation()
> is
> > > >>> waiting.
> > > >>>
> > > >>> Here is a pastebin of the jstack<http://pastebin.com/TLfdrDwX>,
> > > though I
> > > >>> think I wrote the useful parts from it.
> > > >>>
> > > >>> 2016-04-17 22:38:06,436 [tablet.Tablet] TABLET_HIST: vb<<  closed
> > > >>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for low split
> > > >>> vb;2436<
> > > >>>
> >  [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
> > > >>> hdfs:
> > > >>> //localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
> > > >>> hdfs://localhost:90
> > > >>> 00/accumulo/tables/vb/default_tablet/M0000a9g.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
> > > >>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for high split
> > > >>> vb<;2436
> > > >>>
> >  [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
> > > hdfs
> > > >>> ://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
> > > >>> hdfs://localhost:9
> > > >>> 000/accumulo/tables/vb/default_tablet/M0000a9g.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
> > > >>>
> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
> > > >>> 2016-04-17 22:38:06,440 [tserver.TabletServer] DEBUG: ScanSess tid
> > > >>> 127.0.0.1:42302 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1]
> > > >>> 2016-04-17 22:38:06,472 [tablet.Tablet] TABLET_HIST: vb<<  split
> > > vb;2436<
> > > >>> vb<;2436
> > > >>> 2016-04-17 22:38:06,472 [tablet.Tablet] DEBUG: offline split time :
> > > >>>  3.22
> > > >>> secs
> > > >>> *2016-04-17 22:38:06,472 [tserver.TabletServer] INFO : Starting
> > split:
> > > >>> vb<<*
> > > >>> 2016-04-17 22:38:06,472 [tserver.TabletServer] DEBUG: UpSess
> > > >>> 127.0.0.1:41090 0 in 3.059s, at=[0 0 0.00 1] ft=3.010s(pt=3.010s
> > > >>> lt=0.000s ct=0.000s)
> > > >>> 2016-04-17 22:38:06,472 [tserver.NativeMap] DEBUG: Allocated native
> > map
> > > >>> 0x0000000009a20e60
> > > >>>
> > > >>> *2016-04-17 22:38:06,473 [tserver.TabletServer] DEBUG: Failures: 1,
> > > first
> > > >>> extent vb<<  successful commits: 0*
> > > >>>
> > > >>> 2016-04-17 22:38:06,473 [tablet.Tablet] DEBUG: Reloading
> constraints
> > > for
> > > >>> extent: vb;2436<
> > > >>> 2016-04-17 22:38:06,473 [constraints.ConstraintChecker] DEBUG:
> Loaded
> > > >>> constraint
> > > org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
> > > >>> for vb
> > > >>> 2016-04-17 22:38:06,474 [tablet.Tablet] TABLET_HIST: vb;2436<
> opened
> > > >>> 2016-04-17 22:38:06,474 [tserver.NativeMap] DEBUG: Allocated native
> > map
> > > >>> 0x0000000003edf090
> > > >>> 2016-04-17 22:38:06,474 [tablet.Tablet] DEBUG: Reloading
> constraints
> > > for
> > > >>> extent: vb<;2436
> > > >>> 2016-04-17 22:38:06,474 [constraints.ConstraintChecker] DEBUG:
> Loaded
> > > >>> constraint
> > > org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
> > > >>> for vb
> > > >>> 2016-04-17 22:38:06,475 [tablet.Tablet] TABLET_HIST: vb<;2436
> opened
> > > >>> *2016-04-17 22:38:06,475 [tserver.TabletServer] INFO : Tablet
> split:
> > > vb<<
> > > >>> size0 539900144 size1 539900148 time 8268ms*
> > > >>> 2016-04-17 22:38:06,594 [tserver.TabletServer] DEBUG: MultiScanSess
> > > >>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs
> > > tablets:1
> > > >>> ranges:1)
> > > >>> 2016-04-17 22:38:06,597 [tserver.TabletServer] DEBUG: MultiScanSess
> > > >>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs
> > > tablets:1
> > > >>> ranges:1)
> > > >>> 2016-04-17 22:38:07,568 [tserver.TabletServer] DEBUG: ScanSess tid
> > > >>> 127.0.0.1:42307 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1]
> > > >>> 2016-04-17 22:38:07,571 [zookeeper.DistributedWorkQueue] DEBUG:
> > Looking
> > > >>> for work in
> > > >>> /accumulo/c7998e9d-129d-4c61-9732-45bb6bcab3f0/bulk_failed_copyq
> > > >>> 2016-04-17 22:38:07,940 [tserver.TabletServer] DEBUG: UpSess
> > > >>> 127.0.0.1:41090 0 in 0.067s, at=[1 1 1.00 1] ft=0.000s(pt=0.000s
> > > >>> lt=0.000s ct=0.000s)
> > > >>> 2016-04-17 22:38:07,941 [tserver.TabletServer] DEBUG: Failures: 1,
> > > first
> > > >>> extent vb<<  successful commits: 0
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>>
> > > >>
> > >
> >
>

--089e01229ed452fb2d0531188176--