phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Taylor (JIRA)" <>
Subject [jira] [Commented] (PHOENIX-2883) Region close during automatic disabling of index for rebuilding can lead to RS abort
Date Wed, 07 Feb 2018 06:45:00 GMT


James Taylor commented on PHOENIX-2883:

Hundreds of fixes to secondary indexes between 4.8 and 4.13. Would it be possible for you
to upgrade?

> Region close during automatic disabling of index for rebuilding can lead to RS abort
> ------------------------------------------------------------------------------------
>                 Key: PHOENIX-2883
>                 URL:
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>            Priority: Major
> (disclaimer: still performing due-diligence on this one)
> I've been helping a user this week with what is thought to be a race condition in secondary
index updates. This user has a relatively heavy write-based workload with a few tables that
each have at least one index.
> What we have seen is that when the region distribution is changing (concretely, we were
doing a rolling restart of the cluster without the load balancer disabled in the hopes of
retaining as much availability as possible), I've seen the following general outline in the
> * An index update fails (due to {{ERROR 2008 (INT10)}} the index metadata cache expired
or is just missing)
> * The index is taken offline to be asynchronously rebuilt
> * A flush on the data table's region is queue for quite some time
> * RS is asked to close a region (due to a move, commonly)
> * RS aborts because the memstore for the data table's region is in an inconsistent state
(e.g. {{Assertion failed while closing store <region> <colfam> flushableSize expected=0,
actual= 193392. Current memstoreSize=-552208. Maybe a coprocessor operation failed and left
the memstore in a partially updated state.}}
> Some relevant HBase issues include HBASE-10514 and HBASE-10844.
> Have been talking to [~ayingshu] and [~devaraj] about it, but haven't found anything
definitively conclusive yet. Will dump findings here.

This message was sent by Atlassian JIRA

View raw message