couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <>
Subject Re: question about how write_header works
Date Thu, 23 Sep 2010 16:25:18 GMT
The idea also doesn't account for the waste in obsolete b+tree nodes.
Basically, it's more complicated than that.

Compaction is unavoidable with an append-only strategy. One idea I've
pitched (and frankly stolen from Berkeley JE) is for the database file
to be a series of files instead of a single one. If we track the used
space in each file, we can compact any file that drops below a
threshold (by copying the extant data to the new tail and deleting the
old file). This is still compaction but it's no longer a wholesale
rewrite of the database.

All that said, with enough databases and some scheduling, the current
scheme is still pretty good.


On Thu, Sep 23, 2010 at 5:11 PM, Paul Davis <> wrote:
> On Thu, Sep 23, 2010 at 12:00 PM, chongqing xiao <> wrote:
>> Hi, Paul:
>> Thanks for the clarification.
>> I am not sure why this is designed this way but here is one approach I
>> think might work better
>> Instead of appending the header to the data file, why not just moving
>> the header to a different file. The header file can be implmented as
>> before - 2 duplicate header blocks to keep it
>> corruption free. For performance reason, the header file can be cached
>> (say using memory mapped file).
>> The reason I like this approache better is that for the application I
>> am interested in - archiving data from relational database, the saved
>> data never change. So if there is no wasted space for the old header,
>> there is no need to compact the database file.
>> Chong
> Writing the header to the data file means that the header is where the
> data is. Ie, if the header is there and intact, we can be reasonably
> sure that the data the header refers to is also there (barring weirdo
> filesystems like xfs). Using a second file descriptor per database is
> an increase of 100% in the number of file descriptors. This would very
> much affect people that have lots of active databases on a single
> node. I'm sure there are other reasons but I've not had anything to
> eat yet.
> Paul
>> On Thu, Sep 23, 2010 at 8:44 AM, Paul Davis <> wrote:
>>> Its not appended each time data is written necessarily. There are
>>> optimizations to batch as many writes to the database together as
>>> possible as well as delayed commits which will write the header out
>>> every N seconds.
>>> Remember that *any* write to the database is going to look like wasted
>>> space. Even document deletes make the database file grow larger.
>>> When a header is written, it contains checksums of its contents and
>>> when reading we check that nothing has changed. There's an fsync
>>> before and after writing the header which also help to ensure that
>>> writes succeed.
>>> As to the header2 or header1 problem, if header2 appears to be
>>> corrupted or is otherwise discarded, the header search just continues
>>> through the file looking for the next valid header. In this case that
>>> would mean that newData2 would not be considered valid data and
>>> ignored.
>>> HTH,
>>> Paul Davis
>>> On Wed, Sep 22, 2010 at 11:51 PM, chongqing xiao <> wrote:
>>>> Hi, Adam:
>>>> Thanks for the answer.
>>>> If that is how it works, that seems create a lot of wasted space
>>>> assuming a new header has to be appended each time new data is saved.
>>>> Also, assuming here is the data layout
>>>> newData1   ->start
>>>> header1
>>>> newData2
>>>> header2      -> end
>>>> If header 2 is partially written, I am assuming newData will also be
>>>> discarded. If that is the case, I am assuming there is a special flag
>>>> in header 1 so the code can skip newData2 and find header1?
>>>> I am very interested in couchdb and I think it might be a very good
>>>> choice for archiving relational data with some minor changes.
>>>> Thanks
>>>> Chong
>>>> On Wed, Sep 22, 2010 at 10:36 PM, Adam Kocoloski <>
>>>>> Hi Chong, that's exactly right.  Regards,
>>>>> Adam
>>>>> On Sep 22, 2010, at 10:18 PM, chongqing xiao wrote:
>>>>>> Hi,
>>>>>> Could anyone explain how write_header (or header) in works in couchdb?
>>>>>> When appending new header, I am assuming the new header will be
>>>>>> appended to the end of the DB file and the old header will be kept
>>>>>> around?
>>>>>> If that is the case, what will happen if the header is partially
>>>>>> written? I am assuming the code will loop back and find the previous
>>>>>> old header and recover from there?
>>>>>> Thanks
>>>>>> Chong

View raw message