Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@couchdb.apache.org
Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com
 designates 209.85.214.180 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type:content-transfer-encoding;
        b=FpB18tsG9Uohn5qS6EJ0QMlEU7aEzB80qXvdqfZ8E/D0BybPP6gJ5ozD8quRqVUeDT
         YuHbF8Wj3m25CPJVH6FxZpFY4yGAYcVMIG/22DkwuyLwuEcx0O8xxvsA2aDFdwt8ay37
         K6wpxElNGyd+CDs0IwiCMJ9NTKCL8FJfjGhv4=
MIME-Version: 1.0
In-Reply-To: <AANLkTik4kRYbSPpDfkbqDt9crqqMPnx1iYb7yjSE7cxN@mail.gmail.com>
References: <AANLkTi=nD5ZuWaSD=i5Rk_rG_HKr1JSYXkofrRfd8RiH@mail.gmail.com>
 <EE73E1CD-D72A-4926-9D1B-6228BAF41627@apache.org>
 <AANLkTik4kRYbSPpDfkbqDt9crqqMPnx1iYb7yjSE7cxN@mail.gmail.com>
From: Paul Davis <paul.joseph.davis@gmail.com>
Date: Thu, 23 Sep 2010 09:44:28 -0400
Message-ID: <AANLkTinUmuSuVvZTg6S9PywotuO-idDHa9GcOONSmdV8@mail.gmail.com>
Subject: Re: question about how write_header works
To: dev@couchdb.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Its not appended each time data is written necessarily. There are
optimizations to batch as many writes to the database together as
possible as well as delayed commits which will write the header out
every N seconds.

Remember that *any* write to the database is going to look like wasted
space. Even document deletes make the database file grow larger.

When a header is written, it contains checksums of its contents and
when reading we check that nothing has changed. There's an fsync
before and after writing the header which also help to ensure that
writes succeed.

As to the header2 or header1 problem, if header2 appears to be
corrupted or is otherwise discarded, the header search just continues
through the file looking for the next valid header. In this case that
would mean that newData2 would not be considered valid data and
ignored.

HTH,
Paul Davis

On Wed, Sep 22, 2010 at 11:51 PM, chongqing xiao <cqxiao@gmail.com> wrote:
> Hi, Adam:
>
> Thanks for the answer.
>
> If that is how it works, that seems create a lot of wasted space
> assuming a new header has to be appended each time new data is saved.
>
> Also, assuming here is the data layout
>
> newData1 =A0 ->start
> header1
> newData2
> header2 =A0 =A0 =A0-> end
>
> If header 2 is partially written, I am assuming newData will also be
> discarded. If that is the case, I am assuming there is a special flag
> in header 1 so the code can skip newData2 and find header1?
>
> I am very interested in couchdb and I think it might be a very good
> choice for archiving relational data with some minor changes.
>
> Thanks
> Chong
>
> On Wed, Sep 22, 2010 at 10:36 PM, Adam Kocoloski <kocolosk@apache.org> wr=
ote:
>> Hi Chong, that's exactly right. =A0Regards,
>>
>> Adam
>>
>> On Sep 22, 2010, at 10:18 PM, chongqing xiao wrote:
>>
>>> Hi,
>>>
>>> Could anyone explain how write_header (or header) in works in couchdb?
>>>
>>> When appending new header, I am assuming the new header will be
>>> appended to the end of the DB file and the old header will be kept
>>> around?
>>>
>>> If that is the case, what will happen if the header is partially
>>> written? I am assuming the code will loop back and find the previous
>>> old header and recover from there?
>>>
>>> Thanks
>>>
>>> Chong
>>
>>
>