couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Randall Leeds" <randall.le...@gmail.com>
Subject Re: Faster updates, optional ACID
Date Mon, 05 Jan 2009 21:33:33 GMT
IIRC I saw something about OS X and this flush issue before. It came up when
I was talking to an FS person at Apple.

I believe that the explanation was that hardware manufacturers often lie
about the flush call to gain better benchmark scores. The way apple ensures
that the full sync is actually completely is to stuff the buffer of the disk
with nonsense after the write to ensure that the real data has been pushed
out.

Something similar could without a doubt be done on Linux, but I don't know
that the OS handles it in any exposed way yet. I'm not familiar enough with
the appropriate linux syscalls, but perhaps you could patch the erlang VM to
do something similar to OS X. Query and discover the size of the write
buffer on the hardware and write that much garbage in order to flush.

IMO this is an ugly software hack to patch a hardware problem. If you
_absolutely need_ the full flushing you should figure out what hard drive
manufacturers don't produce this sort of flawed fsync behavior or get
something like a battery backed raid. Unfortunately, we're stuck in a hard
place as a result of silly, competitive benchmark races by hard disk
manufacturers.

-Randall

On Mon, Jan 5, 2009 at 15:04, Damien Katz <damien@apache.org> wrote:

>
> On Jan 5, 2009, at 2:51 PM, Geir Magnusson Jr. wrote:
>
>
>> On Jan 5, 2009, at 2:32 PM, Damien Katz wrote:
>>
>>
>>> If necessary and possible, we'll patch the Erlang VM.
>>>
>>
>> That seems like a bad idea to me - I'd think you'd want to stay out of the
>> VM business.
>>
>
> No, I mean send patches to the maintainers of Erlang to fix any problems on
> their supported platforms.  Just like the F_FULLFSYNC patch.
>
>
>>
>>  But if a platform doesn't support proper flushing, then it's not a
>>> platform that can support an ACID database.
>>>
>>
>> We're not communicating well here.
>>
>> "proper flushing" depends on what you want to do - if you need your data
>> to in confirmed permanent storage so that it can survive a crash or power
>> cut, then w/o special configuration (e.g. battery-backed RAID, for example),
>> I don't think that you're going to get assurance on linux.
>>
>> Do you see what I'm saying?
>>
>>
> Yes I see what you are saying. Can you show that Linux doesn't actually
> safely push the bits to disk in popular distros? If that's the case, then we
> need to find the APIs that actually work and call them, and if they don't
> work, we don't support Linux.
>
>
>>>  why not make it a config option, so that the db admin can choose the
>>>> durability level in general, and let clients that know they are talking to
>>>> couch override w/ a header?
>>>>
>>>>
>>> Definitely, I think commit options should be settable per-database. But
>>> for now I was just wanting to address the slowdown, especially for
>>> replication and the tests, to keep everyone productive. More commit features
>>> and options is lower priority work for now, I was just addresses the most
>>> serious slowdown.
>>>
>>
>> That makes sense, but IMO you papered over the root problem.
>> It's good to keep people working, but I think the issue deserves a look.
>>  I don't know erlang, or I would look myself.
>>
>
> What issue? Why do you think this is Erlang specific?
>
>
>
>> geir
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message