incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriel De Oliveira Barbosa <manobi.olive...@gmail.com>
Subject Re: view index build performance improvements coming soon?
Date Sat, 20 Oct 2012 14:06:33 GMT
This topic is also interesting for me.

How can I read this data ? I have to implement this logic in my application or couchdb understand
what I'm finding and redirect me to right database ?
And what if I have to query data between two or more database ?

Thanks

Sent from my iPad

On 20/10/2012, at 08:59, Alexander Shorin <kxepal@gmail.com> wrote:

> Hi Erik!
> 
> The common practice for all databases (SQL, NoSQL) that serves fast
> growing data is partitioning[1] - splitting data into partition per
> some datetime period. Depended upon how fast data grows this period
> may be year, month or even day. Applying to CouchDB this practice you
> have to split data per databases with period in their name e.g.:
> 
> world_logs/2012/10
> world_logs/2012/09
> world_logs/2012/08
> world_logs/2012/07
> ...
> 
> Note slashes in names. With this trick CouchDB will create directory
> hierarchy for these databases at filesystem:
> + world_logs/
> | ---- + 2012/
> | ---- | ---- + 07.couch
> | ---- | ---- + 08.couch
> | ---- | ---- + 09.couch
> | ---- | ---- + 10.couch
> 
> So if your data grows by 1M docs per year splitting him by months will
> creates 12 databases with ~100K documents. The big difference from
> one-big database is that "old" data is already has computed view
> index; if you adding new view you don't need to wait while all data
> will be indexed - you'll get result much faster since index will be
> build for small chunk that you currently interested.
> 
> Also, you still could have simultaneously one big database with all
> data which imports data from these small databases though replication.
> 
> That's about how to optimize data to make views run faster. Also you
> could try to switch from JavaScript query server to Erlang[2] one.
> Erlang query server is native and doesn't suffers from stdio and json
> serialization/deserialization overhead. As for me it gains indexation
> boost for about 3-4 times depending on complexity of map function.
> 
> P.S. There is good news for you: in 1.3 release there will be new
> query server engine(already in master branch) that for my feeling is a
> bit faster than similar in 1.2.
> 
> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
> 
> --
> ,,,^..^,,,
> 
> 
> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <erik@defunweb.com> wrote:
>> Hi,
>> 
>> I'm wondering if there are any write performance improvements on the
>> horizon? Although day to day read queries are great, and modest updates are
>> fine, bulk updates and index rebuilding is pretty painful. I know
>> performance tips are a broad enough topic without focusing it down. Since I
>> need to deal with multiple databases which will grow at about a million
>> documents per year, I'm in a bit of pain even testing the database with
>> significant depth of data (e.g. 5 years).
>> 
>> I'd be happy to provide my use case and experience, but thought I'd cut my
>> usually verbose missives down to the bare question.
>> 
>> Thanks,
>> Erik.

Mime
View raw message