ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maxim Muzafarov (Jira)" <j...@apache.org>
Subject [jira] [Updated] (IGNITE-11252) Docs: Index corruption recovery procedure
Date Mon, 15 Mar 2021 23:33:04 GMT

     [ https://issues.apache.org/jira/browse/IGNITE-11252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Maxim Muzafarov updated IGNITE-11252:
    Fix Version/s:     (was: 2.10)

> Docs: Index corruption recovery procedure
> -----------------------------------------
>                 Key: IGNITE-11252
>                 URL: https://issues.apache.org/jira/browse/IGNITE-11252
>             Project: Ignite
>          Issue Type: Task
>          Components: documentation
>    Affects Versions: 2.7
>            Reporter: Denis A. Magda
>            Assignee: Prachi Garg
>            Priority: Critical
>             Fix For: 2.11
> We need to document a recovery procedure if an index corruption happens. Refer to this
thread for details and examples of the exception dumped to the logs if the issue occurs:
> http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html
> # Recovering from an index corruption
> ## Applicable if
> It is known that an index of a cache is corrupted, but the main data (partition files
and WAL) is fine. Show code snippets of possible examples. Find via the references shared
in the dev list discussion.
> ## Steps to recover
> 1. Stop the node
> 2. Delete index.bin of the affected caches (path is db/<consistent_id>/cache-<cache_name>/index.bin)
> 3. Start the node
> - Note: At this point the node is active in the cluster but don’t have indexes. 
> It means that it serves SQL queries but their performance can be low.
> Avoid running SQL queries on large tables at this point
> 4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in
the Ignite log
> # Recovering from a persistent storage corruption
> ## Applicable if
> A part of the persistent storage (partition files, checkpoint markers or WAL) was corrupted
> and there is no other way to recover it, but there are healthy copies of all data on
other nodes.
> ## Steps to recover
> 1. Stop the node
> 2. Delete all persistence files of the node (best to clear Ignite working directory,
storage directory, WAL and WAL archive directories)
> 3. Make sure consistentId is explicitly set in the configuration of the node
> - If it isn’t, lookup the generated consistentId using control.sh and set it explicitly
in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
> 4. Start the node
> 5. Wait for messages <Finished rebalancing cache> for all caches

This message was sent by Atlassian Jira

View raw message