accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Values go to a wrong table during recovery.
Date Fri, 20 Feb 2015 19:21:08 GMT
Denis,

After our tests, it's highly unlikely this problem is in the server.  If
you can provide a test client that has ever replicated the problem, please
attach it to the ticket.  Otherwise, we'll close the ticket unless someone
else can reproduce the problem.

-Eric


On Fri, Feb 20, 2015 at 1:46 PM, Keith Turner <keith@deenlo.com> wrote:

> I updated ACCUMULO-3603 w/ details about an experiment I ran.
>
> On Wed, Feb 18, 2015 at 9:44 PM, Eric Newton <eric.newton@gmail.com>
> wrote:
>
>> https://issues.apache.org/jira/browse/ACCUMULO-3603
>>
>> -Eric
>>
>>
>> On Wed, Feb 18, 2015 at 7:12 PM, Denis <denis@camfex.cz> wrote:
>>
>>> On 2/18/15, Christopher <ctubbsii@apache.org> wrote:
>>>
>>> > To rule out some scenarios, is it possible that your clients are
>>> writing to
>>> > the wrong tables?
>>> That was the first idea, so I added assert()'s to the code of the
>>> writers few days ago. No assert was triggered, but some invalid values
>>> appear after new tserver failure.
>>>
>>> > Have you ever seen a failure affecting a table which does
>>> > not exist (like what might happen if there's an off-by-one error in
>>> the WAL
>>> > code)? Or affecting the metadata tables?
>>> No.
>>> Also, no tables were created or deleted during last two months.
>>>
>>> > Can you reproduce this error reliably, or can you share the relevant
>>> ingest
>>> > code which can reproduce this failure?
>>>
>>> I will think how to reproduce it.
>>> What could be special about the code: inserts are performed to few
>>> (5..8) tables at once (one data table + few index tables) but no
>>> MultiTableBatchWriter is used. Few BatchWriter`s (one per table) are
>>> created and flushed consequentially, in the same thread. For Accumulo
>>> 1.4 it was a performance optimization, if worked faster than
>>> MultiTableBatchWriter. Not sure if it is so for 1.6.1, this code was
>>> not changed after migration to 1.6.1.
>>> In all cases with invalid values the index tables were affected (one
>>> of the index table had values typical for another of the index
>>> tables).
>>>
>>> > Also, what kind of tablet server failures are you experiencing when
>>> this happens?
>>> Spontaneous power-offs. There is something wrong with the power units
>>> so every 2-3 days one of the servers suddenly turns off and reboots.
>>>
>>
>>
>

Mime
View raw message