couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shorin <kxe...@gmail.com>
Subject Re: view index build performance improvements coming soon?
Date Sat, 20 Oct 2012 14:29:56 GMT
On Sat, Oct 20, 2012 at 6:06 PM, Gabriel De Oliveira Barbosa
<manobi.oliveira@gmail.com> wrote:
> This topic is also interesting for me.
>
> How can I read this data ? I have to implement this logic in my application or couchdb
understand what I'm finding and redirect me to right database ?
> And what if I have to query data between two or more database?

This could be easily done with proxy to os_daemon[1], so the only
thing you have is to write logic for sharding and requesting correct
shards - this mostly depended from that problem you're solving. Also
you can have symlink to actual database with static name - CouchDB is
able to follow them and this allows you to switch current database
shard more less transparently.

But as Robert mentioned, BigCouch merge should simplify these things
and more others.

[1]: http://davispj.com/2010/09/26/new-couchdb-externals-api.html

--
,,,^..^,,,


On Sat, Oct 20, 2012 at 6:20 PM, Robert Newson <rnewson@apache.org> wrote:
> don't forget that database sharding (and therefore view sharding) is
> coming in the release after 1.3 when we merge BigCouch. View shards
> build in parallel.
>
> B.
>
>
> On 20 October 2012 10:06, Gabriel De Oliveira Barbosa
> <manobi.oliveira@gmail.com> wrote:
>> This topic is also interesting for me.
>>
>> How can I read this data ? I have to implement this logic in my application or couchdb
understand what I'm finding and redirect me to right database ?
>> And what if I have to query data between two or more database ?
>>
>> Thanks
>>
>> Sent from my iPad
>>
>> On 20/10/2012, at 08:59, Alexander Shorin <kxepal@gmail.com> wrote:
>>
>>> Hi Erik!
>>>
>>> The common practice for all databases (SQL, NoSQL) that serves fast
>>> growing data is partitioning[1] - splitting data into partition per
>>> some datetime period. Depended upon how fast data grows this period
>>> may be year, month or even day. Applying to CouchDB this practice you
>>> have to split data per databases with period in their name e.g.:
>>>
>>> world_logs/2012/10
>>> world_logs/2012/09
>>> world_logs/2012/08
>>> world_logs/2012/07
>>> ...
>>>
>>> Note slashes in names. With this trick CouchDB will create directory
>>> hierarchy for these databases at filesystem:
>>> + world_logs/
>>> | ---- + 2012/
>>> | ---- | ---- + 07.couch
>>> | ---- | ---- + 08.couch
>>> | ---- | ---- + 09.couch
>>> | ---- | ---- + 10.couch
>>>
>>> So if your data grows by 1M docs per year splitting him by months will
>>> creates 12 databases with ~100K documents. The big difference from
>>> one-big database is that "old" data is already has computed view
>>> index; if you adding new view you don't need to wait while all data
>>> will be indexed - you'll get result much faster since index will be
>>> build for small chunk that you currently interested.
>>>
>>> Also, you still could have simultaneously one big database with all
>>> data which imports data from these small databases though replication.
>>>
>>> That's about how to optimize data to make views run faster. Also you
>>> could try to switch from JavaScript query server to Erlang[2] one.
>>> Erlang query server is native and doesn't suffers from stdio and json
>>> serialization/deserialization overhead. As for me it gains indexation
>>> boost for about 3-4 times depending on complexity of map function.
>>>
>>> P.S. There is good news for you: in 1.3 release there will be new
>>> query server engine(already in master branch) that for my feeling is a
>>> bit faster than similar in 1.2.
>>>
>>> [1]: http://en.wikipedia.org/wiki/Partition_%28database%29
>>> [2]: http://wiki.apache.org/couchdb/EnableErlangViews
>>>
>>> --
>>> ,,,^..^,,,
>>>
>>>
>>> On Sat, Oct 20, 2012 at 4:08 AM, Erik Pearson <erik@defunweb.com> wrote:
>>>> Hi,
>>>>
>>>> I'm wondering if there are any write performance improvements on the
>>>> horizon? Although day to day read queries are great, and modest updates are
>>>> fine, bulk updates and index rebuilding is pretty painful. I know
>>>> performance tips are a broad enough topic without focusing it down. Since
I
>>>> need to deal with multiple databases which will grow at about a million
>>>> documents per year, I'm in a bit of pain even testing the database with
>>>> significant depth of data (e.g. 5 years).
>>>>
>>>> I'd be happy to provide my use case and experience, but thought I'd cut my
>>>> usually verbose missives down to the bare question.
>>>>
>>>> Thanks,
>>>> Erik.

Mime
View raw message