couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anand Chitipothu <anandol...@gmail.com>
Subject Re: CouchDB becoming unusable as Database/Views increase in size.
Date Tue, 21 Dec 2010 15:28:49 GMT
2010/12/21 Bob Clary <bob@bclary.com>:
> Hi all,
>
> I've been using CouchDB to track the results of testing Firefox and have
> found that as the database and view sizes have increased CouchDB is becoming
> less and less viable as a solution going forward. I don't wish to switch to
> a different database at this time but may not have a choice.
>
> Let me say that I have looked at Jira and found others with similar issues
> although issues have mostly been resolved as invalid or already fixed. I do
> admit that I have a hard time navigating Jira, so it is entirely possible
> I've missed already filed issues. I am not sending this email in a
> threatening fashion that I've seen many times in bugzilla where a user says
> "Fix this or I'm leaving!", but in a plea for some help in finding, filing
> or fixing the appropriate Jira issues which need attention.
>
> My database currently has a compacted size of about 37G and contains a bit
> over 9 million documents. You can see examples of the view documents in the
> error log I attached to <https://issues.apache.org/jira/browse/COUCHDB-970>.
>
> I am currently using CouchDB 1.0.1 on Centos5 64bit vm with 2CPU and 4G RAM
> running Erlang R14B and configured to use the 64bit js-devel libraries. I
> temporarily tried to use CouchDB 1.0.x to pick up the fix for
> <https://issues.apache.org/jira/browse/COUCHDB-926> which was causing me
> problems but had to revert to 1.0.1 due to crashes upon view compaction
> completion.
>
> Currently, my main issues are:
>
> Slow View generation: Recreating views from scratch is very slow. It can
> take me over 24 hours to recreate some of the larger views. Combined with
> the need to immediately compact them (see Large Initial View sizes)
> recreating views can take my application offline for users for more than a
> day. Trying to switch to 1.0.x and back and having to regenerate views after
> out of space conditions has led to my application being unavailable for most
> of a week.
>
> Large Initial View sizes: Several of my views are initially created with
> sizes which are 10-20 times the size of the compacted view. For example, I
> have one view which when initially created can take 95G but when compacted
> uses less than 5G. This has caused several out of disk space conditions when
> I've had to regenerate views for the database. I know commodity disks are
> relatively cheap these days, but due to my current hosting environment I am
> using relatively expensive networked storage. Asking for sufficient storage
> for my expected database size was difficult enough, but asking for 10 or
> more times that amount just to deal with temporary explosive view sizes is
> probably a non-starter.
>
> CouchDB 1.0.x: My experience with attempting to use the 1.0.x branch was a
> failure due to the crashing immediately upon view compaction completion
> which caused the views to begin indexing from scratch.
>
> I would appreciate it if you would let me know if some of these are known
> issues which have already been filed in Jira or if it would be helpful to
> file new issues and what additional information I can provide to help get
> these issues resolved.
>
> I can also help in making newer releases of SpiderMonkey 1.7 available and
> to help get SpiderMonkey 1.8 and later released if that will help the
> JavaScript performance issues CouchDB may be facing.
>
> bc

I faced the same situation. I noticed that having a reduce function
makes the view generation very slow. Try to make the view without
reduce function if possible. You can also try to move the reduce
function to a list function.

To improve the speed of view generation, try building the database
from scratch, load 1M or 2M docs in to the db, build the view and
compact the view. Having these intermediate compactions improves the
view generation speed as it reduces the number of disk reads.

Anand

Mime
View raw message