cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-10421) Potential issue with LogTransaction as it only checks in a single directory for files
Date Thu, 08 Oct 2015 00:28:26 GMT


Stefania commented on CASSANDRA-10421:

bq. Is there an advantage to writing only the commit record to all the files? Seems conceptually
easier for them to all be the same log and since it is low traffic there is no performance
motivation. Was there a discussion somewhere else where it seemed like people might delete
the file? Are we really ok with losing atomicity if they don't lose the whole disk?

There was a brief discussion on IRC between [~krummas] and [~benedict] but the conclusion
wasn't clear to me. Here is why so far I chose not to replicate the entire content:

* when we add a new log file lazily, we need to duplicate the content of other existing files
(not a problem but something extra to do and that can go wrong)
* every time we write an sstable record we risk not being able to write this record to a file
when we've already succeeded writing to another file, at the moment this is necessary only
for the last record and on startup we can handle missing file records in some files, so long
as they don't conflict
* a record could become quite long due to a long relative path (it shouldn't matter other
than for readability of the txn file)
* at startup we then have the problem of reconciling sstable records not just the final ones
and handle the case of missing sstable files that according to the txn should be there (if
a disk is removed or files deleted)

I didn't want to tackle all this until I was sure it is necessary, conceptually it seemed
cleaner to have each txn file only handle its own local sstable files but if there are true
safety issues with this (Benedict seemed to think so on IRC) then I am happy to change it.

bq. Can you just have it do nothing if it is called multiple times?. Maybe save a headache
down the road.

{{readRecords}} used to be called independently, now it's only called from {{verify}} so I
guess the assertion could go. Currently we should not attempt to read a file twice so I don't
see why silently doing nothing is better than raising an assertion.

bq. Why is this check not necessary anymore?

I've changed {{record.isValid()}}, called just above. I could not hit that code during unit
tests even with invalid records and it occurred to me that it was because error message and
record type are both set all the time.

bq. Not part of your current work, but relying on modified time for files seems suspect to
me, file contents should have the modified time so copying and other operations don't change

There was a very long discussion about this on CASSANDRA-7066; it starts more or less with
the last paragraph of [this comment|].
We delete files from oldest to newest and the update time is the maximum update time so it
should always match with that's on disk. The reason is that we don't want to delete files
by mistake, as in users coping files from backup without removing a partial txn log that happened
to have obsoleted the very same files. There is a comment in {{deleteRecord()}} but it doesn't
seem complete so we should probably add more comments about this.

> Potential issue with LogTransaction as it only checks in a single directory for files
> -------------------------------------------------------------------------------------
>                 Key: CASSANDRA-10421
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Marcus Eriksson
>            Assignee: Stefania
>            Priority: Blocker
>             Fix For: 3.0.0 rc2
> When creating a new LogTransaction we try to create the new logfile in the same directory
as the one we are writing to, but as we use {{[directories.getDirectoryForNewSSTables()|]}}
this might end up in "any" of the configured data directories. If it does, we will not be
able to clean up leftovers as we check for files in the same directory as the logfile was
> cc [~Stefania]

This message was sent by Atlassian JIRA

View raw message