couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From chongqing xiao <cqx...@gmail.com>
Subject Re: question about how write_header works
Date Thu, 23 Sep 2010 16:00:12 GMT
Hi, Paul:

Thanks for the clarification.

I am not sure why this is designed this way but here is one approach I
think might work better

Instead of appending the header to the data file, why not just moving
the header to a different file. The header file can be implmented as
before - 2 duplicate header blocks to keep it
corruption free. For performance reason, the header file can be cached
(say using memory mapped file).

The reason I like this approache better is that for the application I
am interested in - archiving data from relational database, the saved
data never change. So if there is no wasted space for the old header,
there is no need to compact the database file.

Chong

On Thu, Sep 23, 2010 at 8:44 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> Its not appended each time data is written necessarily. There are
> optimizations to batch as many writes to the database together as
> possible as well as delayed commits which will write the header out
> every N seconds.
>
> Remember that *any* write to the database is going to look like wasted
> space. Even document deletes make the database file grow larger.
>
> When a header is written, it contains checksums of its contents and
> when reading we check that nothing has changed. There's an fsync
> before and after writing the header which also help to ensure that
> writes succeed.
>
> As to the header2 or header1 problem, if header2 appears to be
> corrupted or is otherwise discarded, the header search just continues
> through the file looking for the next valid header. In this case that
> would mean that newData2 would not be considered valid data and
> ignored.
>
> HTH,
> Paul Davis
>
> On Wed, Sep 22, 2010 at 11:51 PM, chongqing xiao <cqxiao@gmail.com> wrote:
>> Hi, Adam:
>>
>> Thanks for the answer.
>>
>> If that is how it works, that seems create a lot of wasted space
>> assuming a new header has to be appended each time new data is saved.
>>
>> Also, assuming here is the data layout
>>
>> newData1   ->start
>> header1
>> newData2
>> header2      -> end
>>
>> If header 2 is partially written, I am assuming newData will also be
>> discarded. If that is the case, I am assuming there is a special flag
>> in header 1 so the code can skip newData2 and find header1?
>>
>> I am very interested in couchdb and I think it might be a very good
>> choice for archiving relational data with some minor changes.
>>
>> Thanks
>> Chong
>>
>> On Wed, Sep 22, 2010 at 10:36 PM, Adam Kocoloski <kocolosk@apache.org> wrote:
>>> Hi Chong, that's exactly right.  Regards,
>>>
>>> Adam
>>>
>>> On Sep 22, 2010, at 10:18 PM, chongqing xiao wrote:
>>>
>>>> Hi,
>>>>
>>>> Could anyone explain how write_header (or header) in works in couchdb?
>>>>
>>>> When appending new header, I am assuming the new header will be
>>>> appended to the end of the DB file and the old header will be kept
>>>> around?
>>>>
>>>> If that is the case, what will happen if the header is partially
>>>> written? I am assuming the code will loop back and find the previous
>>>> old header and recover from there?
>>>>
>>>> Thanks
>>>>
>>>> Chong
>>>
>>>
>>
>

Mime
View raw message