incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Preston Chang <zhangyf2...@gmail.com>
Subject Re: sync commitlog in batch mode lose data
Date Fri, 03 Jun 2011 09:32:49 GMT
Thank you very much Peter !

After I disable the disk cache and change the cache write mode from
write-back to "write-through", I saw the result I'd like to see.

It seems fsync() only synced the data to the disk cache but not the storage
devices while disk cache sync mode in write-back.

But I have another question, while I disable the disk cache but leave the
cache write mode write-back, how sync works ? Still write the data into the
cache ? This issue may not belong to the scope of discussion here [?] .

Thank you all !

2011/6/3 Peter Schuller <peter.schuller@infidyne.com>

> > I disable the disk cache of RAID controller,  unfortunately it still lost
> > some data.
>
> Disabling caching shouldn't be necessary so much as ensuring that all
> layers honor write barriers properly. A battery backed cache that
> survives a power outtage need not be disabled (and usually if you have
> battery backed caching you don't want to since it has a considerable
> performance impact).
>
> To re-address your original post: Yes, given QUORUM @ RF=2 (meaning
> that QUORUM is equivalent to ALL), any *successful* write is supposed
> to be guaranteed to be visible by a subsequent read. In this case even
> at CL.ONE since RF was 2 and QUORUM was equivalent to ALL.
>
> If this is not what you're seeing, likely causes are either (a) a
> problem with your test, (b) a cassandra bug, or (c) a kernel/hardware
> misconfiguration or bug that causes fsync() to be broken with respect
> to power outtages.
>
> In order to eliminate (a), can you share the actual test? Even if (a)
> looks good, you'd be surprised as to how often (c) can be the case.
>
> If you are satisfied that the test is correct, one way to eliminate
> Cassandra as a cause for the problem may be to restart your server by
> a reset instead of cutting power, so that power supply never
> disappears from your storage device. If you are no longer able to
> reproduce the problem, it would indicate that fsync() is at least
> causing I/O to reach a device (exit the operating system). If it still
> fails, you're none the wiser.
>
> If you're running without battery backed cache, or with battery backed
> cache, one test you can do is run this (on a system which is otherwise
> idle):
>
>   http://distfiles.scode.org/mlref/fsynctime.py
>
> The first argument is a filename which will be created/over-written.
> It will then start printing the number of milliseconds each fsync()
> takes. If you do not have battery backed caching, you should be seeing
> numbers in the 5-25 ms range depending on circumstances. If you see
> very low values, that indicates that fsync() is not working and the
> writes are not forced to persistent storage.
>
> (If battery backed caching exists, you will legitimiately get very low
> values without it indicating anything is wrong.)
>
>
> --
> / Peter Schuller
>



-- 
by Preston Chang

Mime
View raw message