incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <rnew...@apache.org>
Subject Re: view index build performance improvements coming soon?
Date Sun, 21 Oct 2012 22:02:20 GMT
Native views are much faster, for sure. Just be aware that they are
not sandboxxed, you must trust everyone that writes design documents;
they can execute any system command.

On 20 October 2012 12:54, Erik Pearson <erik@defunweb.com> wrote:
> Hi Alex,
>
> Thanks for the great write up.
>
> After posting the question yesterday, I went ahead and installed the latest
> couchdb (1.3) from github, and rewrote the map/reduce functions in Erlang.
> I'd rather write them in Erlang anyway :)
>
> Erlang native view performance is much better! It indeed is several times
> faster compared to Javascript views in 1.2. That is great progress. Is 1.3
> (github master) considered to be stable?
>
> One measure I tried that makes no difference is separating view functions
> into separate documents. They build in separate Erlang processes, but the
> overall rate of building the index is the same (roughly 1000 changes/sec)
> as all views in one design doc. Perhaps because Erlang is already
> saturating my cpus with just one view rebuild, or perhaps because of other
> bottlenecks like disk access?
>
> Now we just need a few convenience functions to making writing Erlang views
> less painful... but I'm going to write a separate post on that shortly.
>
> I've read about the upcoming integration of bigcouch, and that is indeed
> exciting and reassuring.
>
>
> Thanks,
> Erik.
>
> On Sat, Oct 20, 2012 at 4:59 AM, Alexander Shorin <kxepal@gmail.com> wrote:
>
>> Hi Erik!
>>
>> The common practice for all databases (SQL, NoSQL) that serves fast
>> growing data is partitioning[1] - splitting data into partition per
>> some datetime period. Depended upon how fast data grows this period
>> may be year, month or even day. Applying to CouchDB this practice you
>> have to split data per databases with period in their name e.g.:
>>
>> world_logs/2012/10
>> world_logs/2012/09
>> world_logs/2012/08
>> world_logs/2012/07
>> ...
>>
>> Note slashes in names. With this trick CouchDB will create directory
>> hierarchy for these databases at filesystem:
>> + world_logs/
>> | ---- + 2012/
>> | ---- | ---- + 07.couch
>> | ---- | ---- + 08.couch
>> | ---- | ---- + 09.couch
>> | ---- | ---- + 10.couch
>>
>> So if your data grows by 1M docs per year splitting him by months will
>> creates 12 databases with ~100K documents. The big difference from
>> one-big database is that "old" data is already has computed view
>> index; if you adding new view you don't need to wait while all data
>> will be indexed - you'll get result much faster since index will be
>> build for small chunk that you currently interested.
>>
>> Also, you still could have simultaneously one big database with all
>> data which imports data from these small databases though replication.
>>
>> That's about how to optimize data to make views run faster. Also you
>> could try to switch from JavaScript query server to Erlang[2] one.
>> Erlang query server is native and doesn't suffers from stdio and json
>> serialization/deserialization overhead. As for me it gains indexation
>> boost for about 3-4 times depending on complexity of map function.
>>
>> P.S. There is good news for you: in 1.3 release there will be new
>> query server engine(already in master branch) that for my feeling is a
>> bit faster than similar in 1.2.
>>
>> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
>> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
>>
>> --
>> ,,,^..^,,,
>>
>>
>> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <erik@defunweb.com> wrote:
>> > Hi,
>> >
>> > I'm wondering if there are any write performance improvements on the
>> > horizon? Although day to day read queries are great, and modest updates
>> are
>> > fine, bulk updates and index rebuilding is pretty painful. I know
>> > performance tips are a broad enough topic without focusing it down.
>> Since I
>> > need to deal with multiple databases which will grow at about a million
>> > documents per year, I'm in a bit of pain even testing the database with
>> > significant depth of data (e.g. 5 years).
>> >
>> > I'd be happy to provide my use case and experience, but thought I'd cut
>> my
>> > usually verbose missives down to the bare question.
>> >
>> > Thanks,
>> > Erik.
>>

Mime
View raw message