cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vijay (Updated) (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-3690) Streaming CommitLog backup
Date Sat, 07 Apr 2012 00:59:19 GMT


Vijay updated CASSANDRA-3690:

    Attachment: 0001-CASSANDRA-3690.patch

Hi Jonathan, Attached patch incorporates all the recommended changes.... except

>>> Maybe we should also have a restore_list_segments command as well, so we can
query s3 (again for instance) directly and have restore_command pull from there, rather than
requiring a local directory?
IMHO. It might be better if we have a streaming API to list and stream the data in... otherwise
we need have to download to the local FS anyways, So it will be better to incrementally download
and use the JMX to restore the files independently (example: A external agent), that may be
a simple solution for now..... If the user has a NFS mount it will work even better all he
needs to do is to "ln -s" location and he is done :)

Plz note that i also removed the requirement to turn off recycling for backup (as recommended),
but i left that as configurable because it will good to have unique names in the backup sometimes
so we dont overwrite :)
> Streaming CommitLog backup
> --------------------------
>                 Key: CASSANDRA-3690
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Vijay
>            Assignee: Vijay
>            Priority: Minor
>             Fix For: 1.1.1
>         Attachments: 0001-CASSANDRA-3690-v2.patch, 0001-CASSANDRA-3690.patch, 0001-Make-commitlog-recycle-configurable.patch,
0002-support-commit-log-listener.patch, 0003-helper-jmx-methods.patch, 0004-external-commitlog-with-sockets.patch,
> Problems with the current SST backups
> 1) The current backup doesn't allow us to restore point in time (within a SST)
> 2) Current SST implementation needs the backup to read from the filesystem and hence
additional IO during the normal operational Disks
> 3) in 1.0 we have removed the flush interval and size when the flush will be triggered
per CF, 
>           For some use cases where there is less writes it becomes increasingly difficult
to time it right.
> 4) Use cases which needs BI which are external (Non cassandra), needs the data in regular
intervals than waiting for longer or unpredictable intervals.
> Disadvantages of the new solution
> 1) Over head in processing the mutations during the recover phase.
> 2) More complicated solution than just copying the file to the archive.
> Additional advantages:
> Online and offline restore.
> Close to live incremental backup.
> Note: If the listener agent gets restarted, it is the agents responsibility to Stream
the files missed or incomplete.
> There are 3 Options in the initial implementation:
> 1) Backup -> Once a socket is connected we will switch the commit log and send new
updates via the socket.
> 2) Stream -> will take the absolute path of the file and will read the file and send
the updates via the socket.
> 3) Restore -> this will get the serialized bytes and apply's the mutation.
> Side NOTE: (Not related to this patch as such) The agent which will take incremental
backup is planned to be open sourced soon (Name: Priam).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message