accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Newton (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-3603) replay mutations to the wrong table?
Date Thu, 19 Feb 2015 15:08:12 GMT


Eric Newton commented on ACCUMULO-3603:

Further details from the mailing list:
What could be special about the code: inserts are performed to few
(5..8) tables at once (one data table + few index tables) but no
MultiTableBatchWriter is used. Few BatchWriter`s (one per table) are
created and flushed consequentially, in the same thread.

In all cases with invalid values the index tables were affected (one
of the index table had values typical for another of the index

Also, what kind of tablet server failures are you experiencing when this happens?

Spontaneous power-offs. There is something wrong with the power units
so every 2-3 days one of the servers suddenly turns off and reboots.

> replay mutations to the wrong table?
> ------------------------------------
>                 Key: ACCUMULO-3603
>                 URL:
>             Project: Accumulo
>          Issue Type: Bug
>          Components: tserver
>    Affects Versions: 1.6.1
>            Reporter: Eric Newton
>            Assignee: Eric Newton
>            Priority: Critical
>             Fix For: 1.6.3
> A user writes to the mailing list:
> {quote}
> Few times I noticed that some tables have values they cannot have, and
> those entries have timestamp close to a tabletserver failure time.
> (I mean wrong format, one table has msgpack values at least 10 bytes
> long and another table has 1-byte values and after a failure I read
> one or two 1-byte values in the table where I expect to read msgpack).
> I suspect that during recovery process, when WAL is being read, some
> entries are inserted to a wrong table.
> May be it is a know bug as I am still using Accumulo 1.6.1
> {quote}
> Consider adding multiple tables to the continuous ingest test to reproduce.
> Note that the random walk test certainly does this, and no failures have been observed.

This message was sent by Atlassian JIRA

View raw message