couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filipe David Manana <fdman...@apache.org>
Subject Re: Some CouchDB internals questions?
Date Thu, 17 Mar 2011 08:56:42 GMT
On Wed, Mar 16, 2011 at 4:53 PM, Zdravko Gligic <zgligic@gmail.com> wrote:
> WOW !
>
> So, how long might it take for this not only to become part of CouchDB
> core but then also to get implemented by all of the ohter CouchDB
> dialects such as CouchBase and BigCouch ,etc ?

Hopefully shouldn't take long to land into Apache CouchDB's trunk, as
so far none of the existing features/components are affected with a
performance drop, and neither it changes any API. The change itself is
relatively simple as well.

For Bigcouch, I can't tell, you should ask Cloudant people (Adam, Robert, etc).

>
> And as dumb as it might sound ;) why was this not done (: the right
> way :) from the very beginning ;?)

Have no idea. I've only been in the community for little more than 1 year.
I would assume that initially developers were more worried about
correctness and elegant APIs, which is perfectly reasonable and sane -
performance should come after.

regards

>
> On Wed, Mar 16, 2011 at 10:02 AM, Filipe David Manana
> <fdmanana@apache.org> wrote:
>> Zdravko,
>>
>> Yesterday a performance related ticket was created:
>>
>> https://issues.apache.org/jira/browse/COUCHDB-1092
>>
>> Apart from the performance improvements, it also reduces very
>> significantly the database sizes (from 2 times less to about 10 times
>> less). So you might be interested to follow/read.
>>
>> On Tue, Mar 15, 2011 at 7:32 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>>> On Tue, Mar 15, 2011 at 2:53 PM, Zdravko Gligic <zgligic@gmail.com> wrote:
>>>>> Have you compacted your db and views?
>>>>
>>>> Yes
>>>>
>>>>> There's unfortunately no direct way to calculate a upper threshold, it
>>>>> really depends on your method for inserting as well as how often you
>>>>> compact.
>>>>
>>>> Once both (docs and view) are compacted, is the resulting size at all
>>>> dependent on how the docs and/or views were created in the first place
>>>> (one at a time or in bulk or whatever) ?
>>>>
>>>
>>> I think to get the absolute minimum post-compaction size you need to
>>> compact twice. I haven't done lots of extensive testing on this, but
>>> last I recall the basic logic was the first time can end up writing
>>> docs in a somewhat randomish ordering depending on how they were
>>> inserted.
>>>
>>>>> This is due to the tail append storage which will orphan data
>>>>> in the file as it writes new records to the various internal data
>>>>> structures.
>>>>
>>>> My 1,500 docs are taking up almost 15 meg (roughly 1/4-1k docs with 2
>>>> views + 1 view with doc re-emit) and I believe were around 50meg
>>>> before compactions.
>>>>
>>>
>>> More importantly, what was the datasize post-compaction though? If
>>> your main db is 15Meg, and you have a view that re-emits the doc, I'd
>>> expect you to have a total size of at least 30Meg. Depending on what
>>> you're emitting in the other two views getting closer to that 50 isn't
>>> hugely out of the question.
>>>
>>
>>
>>
>> --
>> Filipe David Manana,
>> fdmanana@gmail.com, fdmanana@apache.org
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>>
>



-- 
Filipe David Manana,
fdmanana@gmail.com, fdmanana@apache.org

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."

Mime
View raw message