accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Checking what a BatchWriter is stuck on; failure during split
Date Fri, 22 Apr 2016 14:27:49 GMT
Awesome hunting and write-up, Dylan!

Dylan Hutchison wrote:
> Hi all,
>
> Check out ACCUMULO-4229
> <https://issues.apache.org/jira/browse/ACCUMULO-4229>.  I copy the text
> below to include it in our discussion.
>
> This issue came up again during a similar long-lasting BatchWrite.  It had
> the same circumstances: the failures started happening after a split event.
>
> I turned on TRACE logs and I think I pinned it down: the TabletLocator
> cached by a BatchWriter gets out of sync with the static cache of
> TabletLocators.
>
>     1. The TabletServerBatchWriter caches a TabletLocator from the static
>     collection of TabletLocators when it starts writing.  Suppose it is writing
>     to tablet T1.
>     2. The TabletServerBatchWriter uses its locally cached TabletLocator
>     inside its `binMutations` method for its entire lifespan; this cache is
>     never refreshed or updated to sync up with the static collection of
>     TabletLocators.
>     3. Every hour, the static collection of TabletLocators clears itself.
>     The next call to get a TabletLocator from the static collection allocates a
>     new TabletLocator.  Unfortunately, the TabletServerBatchWriter does not
>     reflect this change and continues to use the old, locally cached
>     TabletLocator.
>     4. Tablet T1 splits into T2 and T3, which closes T1.  As such, it no
>     longer exists and the tablet server that receives the entries meant to go
>     to T1 all fail to write because T1 is closed.
>     5. The TabletServerBatchWriter receives the response from the tablet
>     server that all entries failed to write.  It invalidates the cache of the
>     *new* TabletLocator obtained from the static collection of TabletLocators.
>     The old TabletLocator that is cached locally does not get invalidated.
>     6. The TabletServerBatchWriter re-queues the failed entries and tries to
>     write them to the same closed tablet T1, because it is still looking up
>     tablets using the old TabletLocator.
>
> This behavior subsumes the circumstances William wrote about in the thread
> <https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3CCAMz+DuvmmHegOn9EJeHR9H_rRpP50L2QZ53BbdruVO0pirArQw@mail.gmail.com%3E>
> he mentioned.  The problem would occur as a result of either splits or
> major compactions.  It would only stop the BatchWriter if its entire memory
> filled up with writes to the same tablet that was closed as a result of a
> majc or split; otherwise it would just slow down the BatchWriter by failing
> to write some number of entries with every RPC.
>
> There are a few solutions we can think of.
>
>     1. Not have the MutationWriter inside the TabletServerBatchWriter
>     locally cache TabletLocators.  I suspect this was done for performance
>     reasons, so it's probably not a good solution.
>     2. Have all the MutationWriters clear their cache at the same time the
>     static TabletLocator cache clears.  I like this one.  We could store a
>     reference to the Map that each MutationWriter has inside a static
>     synchronized WeakHashMap.  The only time the weak map needs to be accessed
>     is:
>        1. When a MutationWriter is constructed (from constructing a
>        TabletServerBatchWriter), add its new local TabletLocator cache
> to the weak
>        map.
>        2. When the static TabletLocator cache is cleared, also clear every
>        map in the weak map.
>     3. Another solution is to make the invalidate calls on the local
>     TabletLocator cache rather than the global static one.  If we go this route
>     we should double check the idea to make sure it does not impact the
>     correctness of any other pieces of code that use the cache. I like the
>     previous idea better.
>
> The TimeoutTabletLocator does not help when no timeout is set on the
> BatchWriter (the default behavior).
>
>
> On Tue, Apr 19, 2016 at 8:09 PM, William Slacum<wslacum@gmail.com>  wrote:
>
>> Good digs, Dylan. I don't think it's too rare to matter. I notice  often
>> during MR jobs, and there's usually a point where I give up and just start
>> writing RFiles.
>>
>> It could possibly be related to what I saw back in the dayoday with:
>>
>> https://mail-archives.apache.org/mod_mbox/accumulo-user/201406.mbox/%3CCAMz+DuvmmHegOn9EJeHR9H_rRpP50L2QZ53BbdruVO0pirArQw@mail.gmail.com%3E
>>
>> On Tue, Apr 19, 2016 at 6:26 PM, Josh Elser<josh.elser@gmail.com>  wrote:
>>
>>> Nice findings. Sorry I haven't had any cycles to dig into this myself.
>>>
>>> I look forward to hearing what you find :)
>>>
>>>
>>> Dylan Hutchison wrote:
>>>
>>>> I investigated a bit more and I am pretty sure the problem is that the
>>>> BatchWriter is not recognizing that the tablet vb<<   split into vb;2436<
>>>> and
>>>> vb<;2436.  It keeps trying to update the closed tablet vb<<.  Each
>> update
>>>> writes 0 mutations and records a failure at the tablet server
>>>> UpdateSession
>>>> because vb<<   is closed.
>>>>
>>>> I'm not sure why this is happening because the BatchWriter should have
>>>> invalidated its tablet locator cache upon recognizing a failure.  Then
>> it
>>>> would recognize that the entries it wants to write fall into the new
>>>> tablets vb;2436<   and vb<;2436.  I think there is a timing bug for
this
>>>> edge
>>>> case, when a table split occurs during heavy writes.
>>>>
>>>> I will write this up if I can reproduce it.  Maybe it is too rare to
>>>> matter.
>>>>
>>>> Cheers, Dylan
>>>>
>>>> On Mon, Apr 18, 2016 at 2:38 PM, Dylan Hutchison<
>>>> dhutchis@cs.washington.edu
>>>>
>>>>> wrote:
>>>>>
>>>> Hi devs,
>>>>> I'd like to ask your help in figuring out what is happening to a
>>>>> BatchWriter.  The following gives my reasoning so far.
>>>>>
>>>>> In Accumulo 1.7.1, I have a BatchWriter that is stuck in WAITING status
>>>>> in
>>>>> its addMutation method.  I saw that it is stuck by jstack'ing the
>>>>> Accumulo
>>>>> client.  It's been stuck like this for 16 hours.
>>>>>
>>>>> The BatchWriter is supposed to wait when a mutation is added if no
>>>>> failures have recorded and either (a) the total memory used exceeds the
>>>>> maximum allowed for the BatchWriter, or (b) the batchwriter is
>> currently
>>>>> flushed.  So we conclude that one of (a) or (b) have occurred and no
>>>>> failures were recorded, at the time when addMutation was called.  I
>> think
>>>>> (a) is likely.
>>>>>
>>>>> The BatchWriter is supposed to notify itself when either (1) a flush
>>>>> finishes, (2) a constraint violation or authorization failure or server
>>>>> error or unknown error occurs, (3) memory usage decreases, which
>> happens
>>>>> when entries successfully send to the tablet server.  Since the
>>>>> BatchWriter
>>>>> is stuck on WAITING, none of these conditions are occurring.
>>>>>
>>>>> The BatchWriter has 3 write threads (the default number).  All three
>> have
>>>>> status TIMED_WAITING (parked) in jstack.  Their stack traces don't give
>>>>> useful information.
>>>>>
>>>>> Here's what I can tell from the tserver logs.  A new table (and tablet)
>>>>> was created successfully.  The BatchWriter started writing to this
>> tablet
>>>>> steadily.  The logs show that the tablet (vb<<) flushed every 5
seconds
>>>>> or
>>>>> so and major compacted at a steady periodic rate.
>>>>>
>>>>> Everything looks good, until vb<<   grew large enough that it needed
>>>>> splitting.  This occurred about 42 minutes after the BatchWriter
>> started
>>>>> writing entries.  The logs show a failure in an UpdateSession that
>> popped
>>>>> up in the middle of the split operation.  This failure continues to
>> show
>>>>> for the next 15 hours.
>>>>>
>>>>> I copied the portion of the tserver logs that look relevant to the
>> split
>>>>> below.  I highlighted the line reporting the first failure.  It occurs
>> in
>>>>> between when the split starts and when it finishes.
>>>>>
>>>>> Any idea what could have caused this?  I don't know if the failure is
>>>>> related to the BatchWriter being stuck in WAITING.  It seems likely.
 I
>>>>> think it is weird that the 3 write threads are all idle; at least one
>> of
>>>>> them should be doing something if the thread calling addMutation() is
>>>>> waiting.
>>>>>
>>>>> Here is a pastebin of the jstack<http://pastebin.com/TLfdrDwX>,
>> though I
>>>>> think I wrote the useful parts from it.
>>>>>
>>>>> 2016-04-17 22:38:06,436 [tablet.Tablet] TABLET_HIST: vb<<   closed
>>>>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for low split
>>>>> vb;2436<
>>>>>    [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
>>>>> hdfs:
>>>>> //localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
>>>>> hdfs://localhost:90
>>>>> 00/accumulo/tables/vb/default_tablet/M0000a9g.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
>>>>> 2016-04-17 22:38:06,439 [tablet.Tablet] DEBUG: Files for high split
>>>>> vb<;2436
>>>>>    [hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00008lh.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C00009iz.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a08.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a4t.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a7m.rf,
>> hdfs
>>>>> ://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8f.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/C0000a8n.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8p.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/F0000a8q.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a93.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000a9b.rf,
>>>>> hdfs://localhost:9
>>>>> 000/accumulo/tables/vb/default_tablet/M0000a9g.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aa7.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000aap.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abe.rf,
>>>>> hdfs://localhost:9000/accumulo/tables/vb/default_tablet/M0000abn.rf]
>>>>> 2016-04-17 22:38:06,440 [tserver.TabletServer] DEBUG: ScanSess tid
>>>>> 127.0.0.1:42302 !0 0 entries in 0.00 secs, nbTimes = [0 0 0.00 1]
>>>>> 2016-04-17 22:38:06,472 [tablet.Tablet] TABLET_HIST: vb<<   split
>> vb;2436<
>>>>> vb<;2436
>>>>> 2016-04-17 22:38:06,472 [tablet.Tablet] DEBUG: offline split time :
>>>>>   3.22
>>>>> secs
>>>>> *2016-04-17 22:38:06,472 [tserver.TabletServer] INFO : Starting split:
>>>>> vb<<*
>>>>> 2016-04-17 22:38:06,472 [tserver.TabletServer] DEBUG: UpSess
>>>>> 127.0.0.1:41090 0 in 3.059s, at=[0 0 0.00 1] ft=3.010s(pt=3.010s
>>>>> lt=0.000s ct=0.000s)
>>>>> 2016-04-17 22:38:06,472 [tserver.NativeMap] DEBUG: Allocated native map
>>>>> 0x0000000009a20e60
>>>>>
>>>>> *2016-04-17 22:38:06,473 [tserver.TabletServer] DEBUG: Failures: 1,
>> first
>>>>> extent vb<<   successful commits: 0*
>>>>>
>>>>> 2016-04-17 22:38:06,473 [tablet.Tablet] DEBUG: Reloading constraints
>> for
>>>>> extent: vb;2436<
>>>>> 2016-04-17 22:38:06,473 [constraints.ConstraintChecker] DEBUG: Loaded
>>>>> constraint
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>>>> for vb
>>>>> 2016-04-17 22:38:06,474 [tablet.Tablet] TABLET_HIST: vb;2436<   opened
>>>>> 2016-04-17 22:38:06,474 [tserver.NativeMap] DEBUG: Allocated native map
>>>>> 0x0000000003edf090
>>>>> 2016-04-17 22:38:06,474 [tablet.Tablet] DEBUG: Reloading constraints
>> for
>>>>> extent: vb<;2436
>>>>> 2016-04-17 22:38:06,474 [constraints.ConstraintChecker] DEBUG: Loaded
>>>>> constraint
>> org.apache.accumulo.core.constraints.DefaultKeySizeConstraint
>>>>> for vb
>>>>> 2016-04-17 22:38:06,475 [tablet.Tablet] TABLET_HIST: vb<;2436 opened
>>>>> *2016-04-17 22:38:06,475 [tserver.TabletServer] INFO : Tablet split:
>> vb<<
>>>>> size0 539900144 size1 539900148 time 8268ms*
>>>>> 2016-04-17 22:38:06,594 [tserver.TabletServer] DEBUG: MultiScanSess
>>>>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs
>> tablets:1
>>>>> ranges:1)
>>>>> 2016-04-17 22:38:06,597 [tserver.TabletServer] DEBUG: MultiScanSess
>>>>> 127.0.0.1:51092 0 entries in 0.00 secs (lookup_time:0.00 secs
>> tablets:1
>>>>> ranges:1)
>>>>> 2016-04-17 22:38:07,568 [tserver.TabletServer] DEBUG: ScanSess tid
>>>>> 127.0.0.1:42307 !0 0 entries in 0.00 secs, nbTimes = [1 1 1.00 1]
>>>>> 2016-04-17 22:38:07,571 [zookeeper.DistributedWorkQueue] DEBUG: Looking
>>>>> for work in
>>>>> /accumulo/c7998e9d-129d-4c61-9732-45bb6bcab3f0/bulk_failed_copyq
>>>>> 2016-04-17 22:38:07,940 [tserver.TabletServer] DEBUG: UpSess
>>>>> 127.0.0.1:41090 0 in 0.067s, at=[1 1 1.00 1] ft=0.000s(pt=0.000s
>>>>> lt=0.000s ct=0.000s)
>>>>> 2016-04-17 22:38:07,941 [tserver.TabletServer] DEBUG: Failures: 1,
>> first
>>>>> extent vb<<   successful commits: 0
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>

Mime
View raw message