hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-15227) HBase Backup Phase 3: Fault tolerance (client/server) support
Date Mon, 03 Jul 2017 04:47:00 GMT

     [ https://issues.apache.org/jira/browse/HBASE-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HBASE-15227:
--------------------------
    Priority: Major  (was: Blocker)

> HBase Backup Phase 3: Fault tolerance (client/server) support
> -------------------------------------------------------------
>
>                 Key: HBASE-15227
>                 URL: https://issues.apache.org/jira/browse/HBASE-15227
>             Project: HBase
>          Issue Type: Task
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>              Labels: backup
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15227-v3.patch, HBASE-15277-v1.patch
>
>
> System must be tolerant to faults: 
> # Backup operations MUST be atomic (no partial completion state in the backup system
table)
> # Process must detect any type of failures which can result in a data loss (partial backup
or partial restore) 
> # Proper system table state restore and cleanup must be done in case of a failure
> # Additional utility to repair backup system table and corresponding file system cleanup
must be implemented
> h3. Backup
> h4. General FT framework implementation 
> Before actual backup operation starts, snapshot of a backup system table is taken and
system table is updated with *ACTIVE_SNAPSHOT* flag. The flag will be removed upon backup
completion. 
> In case of *any* server-side failures, client catches errors/exceptions and handles them:
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full backup we snapshot
tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup system table
before)
> In case of *any* client-side failures:
> Before any backup or restore operation run we check backup system table on *ACTIVE_SNAPSHOT*,
if flag is present, operation aborts with a message that backup repair tool (see below) must
be run
> h4. Backup repair tool
> The command line tool *backup repair* which executes the following steps:
> # Reads info of a last failed backup session
> # Cleans up backup destination (removes partial backup data)
> # Cleans up any temporary data
> # Deletes  any active snapshots of a tables being backed up (during full backup we snapshot
tables)
> # Restores backup system table from snapshot
> # Deletes backup system table snapshot (we read snapshot name from backup system table
before)
> h4. Detection of a partial loss of data
> h5. Full backup  
> Export snapshot operation (?).
> We count files and check sizes before and after DistCp run
> h5. Incremental backup 
> Conversion of WAL to HFiles, when WAL file is moved from active to archive directory.
The code is in place to handle this situation
> During DistCp run (same as above)
> h3. Restore
> This operation does not modify backup system table and is idempotent. No special FT is
required.   
>  
>      



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message