geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anthony Baker (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-292) Optimize speed of backup/ restore cache snapshots
Date Fri, 28 Aug 2015 22:09:46 GMT

    [ https://issues.apache.org/jira/browse/GEODE-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14720672#comment-14720672
] 

Anthony Baker commented on GEODE-292:
-------------------------------------

Currently the snapshot files are completely self-contained.  Removing the PDX definitions
from the snapshot file when the data relies on those types could lead to import failures if
the ops process is incorrect.

The simplest thing is probably to add a {{--skip-pdx=true}} flag to the import/export command.
 If the flag is true, then the command would either avoid writing or reading the PDX type
definitions from the snapshot file.  The ops process would need to ensure that the first import
causes all the PDX types to be defined that are used throughout the import cycle.

Alternatively, we could extend the snapshot format to include multiple regions within the
same file to optimize I/O performance and minimize duplicate read/writes of the type definitions.

This is really only an issue when PDX type explosion occurs due to the JSON converter.


> Optimize speed of backup/ restore cache snapshots
> -------------------------------------------------
>
>                 Key: GEODE-292
>                 URL: https://issues.apache.org/jira/browse/GEODE-292
>             Project: Geode
>          Issue Type: Improvement
>          Components: persistence
>    Affects Versions: 1.0.0-incubating
>            Reporter: Wes Williams
>            Assignee: Wes Williams
>              Labels: backups, performance
>             Fix For: 1.0.0-incubating
>
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Backup/ restore using snapshots takes a very, very long time when:
> 1) There are a lot of PDXTypes, and
> 2) There are a lot of regions.
> Specifically, it takes 35 minutes to restore only 50MB of unstructured JSON from snapshots.
In contrast, it takes only 3 minutes to reload all the data from scratch.
> PROBLEM
> CacheSnapshot loops all regions and saves all cache PDXTypes in every region.gfd. On
restore, it reloads all cache PDXTypes again for every region where they only need to be loaded
once.  
> SOLUTION
> This JIRA issue will create an option to save PDXTypes only once and reload them once
and store only data in the region snapshots. Existing functionality will remain for those
who want it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message