accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Newton <eric.new...@gmail.com>
Subject Re: Values go to a wrong table during recovery.
Date Thu, 19 Feb 2015 02:44:36 GMT
https://issues.apache.org/jira/browse/ACCUMULO-3603

-Eric


On Wed, Feb 18, 2015 at 7:12 PM, Denis <denis@camfex.cz> wrote:

> On 2/18/15, Christopher <ctubbsii@apache.org> wrote:
>
> > To rule out some scenarios, is it possible that your clients are writing
> to
> > the wrong tables?
> That was the first idea, so I added assert()'s to the code of the
> writers few days ago. No assert was triggered, but some invalid values
> appear after new tserver failure.
>
> > Have you ever seen a failure affecting a table which does
> > not exist (like what might happen if there's an off-by-one error in the
> WAL
> > code)? Or affecting the metadata tables?
> No.
> Also, no tables were created or deleted during last two months.
>
> > Can you reproduce this error reliably, or can you share the relevant
> ingest
> > code which can reproduce this failure?
>
> I will think how to reproduce it.
> What could be special about the code: inserts are performed to few
> (5..8) tables at once (one data table + few index tables) but no
> MultiTableBatchWriter is used. Few BatchWriter`s (one per table) are
> created and flushed consequentially, in the same thread. For Accumulo
> 1.4 it was a performance optimization, if worked faster than
> MultiTableBatchWriter. Not sure if it is so for 1.6.1, this code was
> not changed after migration to 1.6.1.
> In all cases with invalid values the index tables were affected (one
> of the index table had values typical for another of the index
> tables).
>
> > Also, what kind of tablet server failures are you experiencing when this
> happens?
> Spontaneous power-offs. There is something wrong with the power units
> so every 2-3 days one of the servers suddenly turns off and reboots.
>

Mime
View raw message