ignite-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Denis Magda (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IGNITE-11252) Docs: Index corruption recovery procedure
Date Thu, 07 Feb 2019 19:54:00 GMT
Denis Magda created IGNITE-11252:
------------------------------------

             Summary: Docs: Index corruption recovery procedure
                 Key: IGNITE-11252
                 URL: https://issues.apache.org/jira/browse/IGNITE-11252
             Project: Ignite
          Issue Type: Task
          Components: documentation
    Affects Versions: 2.7
            Reporter: Denis Magda
            Assignee: Prachi Garg
             Fix For: 2.8


We need to document a recovery procedure if an index corruption happens. Refer to this thread
for details and examples of the exception dumped to the logs if the issue occurs:
http://apache-ignite-developers.2346864.n4.nabble.com/Ignite-index-corruption-issue-gt-unrecoverable-cluster-td39869.html

# Recovering from an index corruption
## Applicable if
It is known that an index of a cache is corrupted, but the main data (partition files and
WAL) is fine. Show code snippets of possible examples. Find via the references shared in the
dev list discussion.

## Steps to recover
1. Stop the node
2. Delete index.bin of the affected caches (path is db/<consistent_id>/cache-<cache_name>/index.bin)
3. Start the node
- Note: At this point the node is active in the cluster but don’t have indexes. 
It means that it serves SQL queries but their performance can be low.
Avoid running SQL queries on large tables at this point
4. Wait for message “Finished indexes rebuilding for cache <cache_name>” in the
Ignite log

# Recovering from a persistent storage corruption
## Applicable if
A part of the persistent storage (partition files, checkpoint markers or WAL) was corrupted
and there is no other way to recover it, but there are healthy copies of all data on other
nodes.

## Steps to recover
1. Stop the node
2. Delete all persistence files of the node (best to clear Ignite working directory, storage
directory, WAL and WAL archive directories)
3. Make sure consistentId is explicitly set in the configuration of the node
- If it isn’t, lookup the generated consistentId using control.sh and set it explicitly
in the config or via IGNITE_CONSISTENT_ID (2.8+ only)
4. Start the node
5. Wait for messages <Finished rebalancing cache> for all caches



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message