accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Vines <vi...@apache.org>
Subject Re: Values go to a wrong table during recovery.
Date Fri, 20 Feb 2015 19:21:51 GMT
You said that you were operating this on 1.4. Is this the exact same
cluster or is it just the same code you were using? Did you have walogs
laying around when you upgraded? Did you upgrade through 1.5 or straight
from 1.4 to 1.6?

On Fri, Feb 20, 2015 at 1:46 PM, Keith Turner <keith@deenlo.com> wrote:

> I updated ACCUMULO-3603 w/ details about an experiment I ran.
>
> On Wed, Feb 18, 2015 at 9:44 PM, Eric Newton <eric.newton@gmail.com>
> wrote:
>
>> https://issues.apache.org/jira/browse/ACCUMULO-3603
>>
>> -Eric
>>
>>
>> On Wed, Feb 18, 2015 at 7:12 PM, Denis <denis@camfex.cz> wrote:
>>
>>> On 2/18/15, Christopher <ctubbsii@apache.org> wrote:
>>>
>>> > To rule out some scenarios, is it possible that your clients are
>>> writing to
>>> > the wrong tables?
>>> That was the first idea, so I added assert()'s to the code of the
>>> writers few days ago. No assert was triggered, but some invalid values
>>> appear after new tserver failure.
>>>
>>> > Have you ever seen a failure affecting a table which does
>>> > not exist (like what might happen if there's an off-by-one error in
>>> the WAL
>>> > code)? Or affecting the metadata tables?
>>> No.
>>> Also, no tables were created or deleted during last two months.
>>>
>>> > Can you reproduce this error reliably, or can you share the relevant
>>> ingest
>>> > code which can reproduce this failure?
>>>
>>> I will think how to reproduce it.
>>> What could be special about the code: inserts are performed to few
>>> (5..8) tables at once (one data table + few index tables) but no
>>> MultiTableBatchWriter is used. Few BatchWriter`s (one per table) are
>>> created and flushed consequentially, in the same thread. For Accumulo
>>> 1.4 it was a performance optimization, if worked faster than
>>> MultiTableBatchWriter. Not sure if it is so for 1.6.1, this code was
>>> not changed after migration to 1.6.1.
>>> In all cases with invalid values the index tables were affected (one
>>> of the index table had values typical for another of the index
>>> tables).
>>>
>>> > Also, what kind of tablet server failures are you experiencing when
>>> this happens?
>>> Spontaneous power-offs. There is something wrong with the power units
>>> so every 2-3 days one of the servers suddenly turns off and reboots.
>>>
>>
>>
>

Mime
View raw message