Thank you very much Peter !

After I disable the disk cache and change the cache write mode from write-back to "write-through", I saw the result I'd like to see. 

It seems fsync() only synced the data to the disk cache but not the storage devices while disk cache sync mode in write-back.

But I have another question, while I disable the disk cache but leave the cache write mode write-back, how sync works ? Still write the data into the cache ? This issue may not belong to the scope of discussion here  .

Thank you all !

2011/6/3 Peter Schuller <>
> I disable the disk cache of RAID controller,  unfortunately it still lost
> some data.

Disabling caching shouldn't be necessary so much as ensuring that all
layers honor write barriers properly. A battery backed cache that
survives a power outtage need not be disabled (and usually if you have
battery backed caching you don't want to since it has a considerable
performance impact).

To re-address your original post: Yes, given QUORUM @ RF=2 (meaning
that QUORUM is equivalent to ALL), any *successful* write is supposed
to be guaranteed to be visible by a subsequent read. In this case even
at CL.ONE since RF was 2 and QUORUM was equivalent to ALL.

If this is not what you're seeing, likely causes are either (a) a
problem with your test, (b) a cassandra bug, or (c) a kernel/hardware
misconfiguration or bug that causes fsync() to be broken with respect
to power outtages.

In order to eliminate (a), can you share the actual test? Even if (a)
looks good, you'd be surprised as to how often (c) can be the case.

If you are satisfied that the test is correct, one way to eliminate
Cassandra as a cause for the problem may be to restart your server by
a reset instead of cutting power, so that power supply never
disappears from your storage device. If you are no longer able to
reproduce the problem, it would indicate that fsync() is at least
causing I/O to reach a device (exit the operating system). If it still
fails, you're none the wiser.

If you're running without battery backed cache, or with battery backed
cache, one test you can do is run this (on a system which is otherwise

The first argument is a filename which will be created/over-written.
It will then start printing the number of milliseconds each fsync()
takes. If you do not have battery backed caching, you should be seeing
numbers in the 5-25 ms range depending on circumstances. If you see
very low values, that indicates that fsync() is not working and the
writes are not forced to persistent storage.

(If battery backed caching exists, you will legitimiately get very low
values without it indicating anything is wrong.)

/ Peter Schuller

by Preston Chang