couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Pearson <e...@defunweb.com>
Subject Re: view index build performance improvements coming soon?
Date Sat, 20 Oct 2012 16:54:30 GMT
Hi Alex,

Thanks for the great write up.

After posting the question yesterday, I went ahead and installed the latest
couchdb (1.3) from github, and rewrote the map/reduce functions in Erlang.
I'd rather write them in Erlang anyway :)

Erlang native view performance is much better! It indeed is several times
faster compared to Javascript views in 1.2. That is great progress. Is 1.3
(github master) considered to be stable?

One measure I tried that makes no difference is separating view functions
into separate documents. They build in separate Erlang processes, but the
overall rate of building the index is the same (roughly 1000 changes/sec)
as all views in one design doc. Perhaps because Erlang is already
saturating my cpus with just one view rebuild, or perhaps because of other
bottlenecks like disk access?

Now we just need a few convenience functions to making writing Erlang views
less painful... but I'm going to write a separate post on that shortly.

I've read about the upcoming integration of bigcouch, and that is indeed
exciting and reassuring.


Thanks,
Erik.

On Sat, Oct 20, 2012 at 4:59 AM, Alexander Shorin <kxepal@gmail.com> wrote:

> Hi Erik!
>
> The common practice for all databases (SQL, NoSQL) that serves fast
> growing data is partitioning[1] - splitting data into partition per
> some datetime period. Depended upon how fast data grows this period
> may be year, month or even day. Applying to CouchDB this practice you
> have to split data per databases with period in their name e.g.:
>
> world_logs/2012/10
> world_logs/2012/09
> world_logs/2012/08
> world_logs/2012/07
> ...
>
> Note slashes in names. With this trick CouchDB will create directory
> hierarchy for these databases at filesystem:
> + world_logs/
> | ---- + 2012/
> | ---- | ---- + 07.couch
> | ---- | ---- + 08.couch
> | ---- | ---- + 09.couch
> | ---- | ---- + 10.couch
>
> So if your data grows by 1M docs per year splitting him by months will
> creates 12 databases with ~100K documents. The big difference from
> one-big database is that "old" data is already has computed view
> index; if you adding new view you don't need to wait while all data
> will be indexed - you'll get result much faster since index will be
> build for small chunk that you currently interested.
>
> Also, you still could have simultaneously one big database with all
> data which imports data from these small databases though replication.
>
> That's about how to optimize data to make views run faster. Also you
> could try to switch from JavaScript query server to Erlang[2] one.
> Erlang query server is native and doesn't suffers from stdio and json
> serialization/deserialization overhead. As for me it gains indexation
> boost for about 3-4 times depending on complexity of map function.
>
> P.S. There is good news for you: in 1.3 release there will be new
> query server engine(already in master branch) that for my feeling is a
> bit faster than similar in 1.2.
>
> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
>
> --
> ,,,^..^,,,
>
>
> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <erik@defunweb.com> wrote:
> > Hi,
> >
> > I'm wondering if there are any write performance improvements on the
> > horizon? Although day to day read queries are great, and modest updates
> are
> > fine, bulk updates and index rebuilding is pretty painful. I know
> > performance tips are a broad enough topic without focusing it down.
> Since I
> > need to deal with multiple databases which will grow at about a million
> > documents per year, I'm in a bit of pain even testing the database with
> > significant depth of data (e.g. 5 years).
> >
> > I'd be happy to provide my use case and experience, but thought I'd cut
> my
> > usually verbose missives down to the bare question.
> >
> > Thanks,
> > Erik.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message