couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hans J Schroeder ...@cloudno.de>
Subject Re: database design question: concurrent writes
Date Thu, 13 Dec 2012 18:37:48 GMT

On Dec 13, 2012, at 6:41 PM, Robert Newson <rnewson@apache.org> wrote:

> Two databases can have a different number of shards, different numbers
> of replicas of each shard, different locations for those shards, we
> might move shards over time as we add or remove nodes from the
> cluster. I don't see how you can do any of that without a document
> describing the layout.
> 
> B.
> 
> On 13 December 2012 17:30, Benoit Chesneau <bchesneau@gmail.com> wrote:
>> On Thu, Dec 13, 2012 at 5:46 PM, Robert Newson <rnewson@apache.org> wrote:
>> 
>>> Views are also sharded.
>>> 
>>> It's common for a node to host multiple shards of the same database,
>>> so we already have this 'concurrent writes' notion, if I've
>>> interpreted it correctly.
>>> 
>>> 
>> Well my question are more related why using a mapping ? And how to keep the
>> backup of one database easy for a user. Possibly without relying on a
>> mapping stored aside.
>> 
>> Other question is  why bigcouch choose that design vs the one I propose.
>> 
>> @Hans since db are shared and views are done / db ,views indexations are
>> also concurrent.
>> 
>> - benoît
>> 
>> 
>> 
>>> On 13 December 2012 16:36, Hans J Schroeder <hs@cloudno.de> wrote:
>>>> 
>>>> On Dec 13, 2012, at 5:06 PM, Benoit Chesneau <bchesneau@gmail.com>
>>> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> 
>>>>> This morning I was back reading a lot of fundamentals about  databases
>>> and
>>>>> such and was asking myself how we could increase the number of
>>> concurrent
>>>>> writes.
>>>>> 
>>>>> These days the theory is that it will be solved by sharding the
>>> databases
>>>>> in multiples database files and merging results of the queries. Since
>>> the
>>>>> databases will be shareded then the writes on the same db will be
>>>>> concurrents. A map of the shards willl be kept aside. All of this
>>> thanks to
>>>>> the introduction of bigcouch.
>>>>> 
>>>>> The question I have is why don't we already do that? Ie balancing datas
>>> on
>>>>> different files on one db? for example the db folder could be
>>>>> 
>>>>> database/XY.couch
>>>>> 
>>>>> where XY are the first letters of an id or content hash or any
>>> consistent
>>>>> hashing method.
>>>>> 
>>>>> I am currently asking myself such question because I am wondering how
>>> will
>>>>> the backup works when couchdb will be used as a single node. How to
>>> backup
>>>>> only one db without having to query for the mapping and such? How to
>>> keep
>>>>> it it simple.
>>>>> 
>>>>> Related to that why did bigcouch used that design? Why mapping shards
>>> in a
>>>>> db database instead of having some kind of natural balancing on the fs
>>> and
>>>>> having a consistent hashing algorithm used to balance on different
>>>>> machines/vms as well ?
>>>>> 
>>>>> 
>>>>> - benoît
>>>> 
>>>> 
>>>> Hi,
>>>> 
>>>> That's like "horizontal partitioning" in conventional databases and I
>>> think its a great idea. Having a writer process for each partition will
>>> make it scale.
>>>> 
>>>> Does Bigcouch have anything for the view files too or are they just
>>> sharding the backing files?
>>>> 
>>>> - Hans
>>> 

Thanks for the clarification about the views. 

Its all about what we want to have. For concurrent writes, a simple shuffling like Benoit
has described, would be an efficient solution. For configurable clusters, a mapping store
of some kind is needed.

- Hans
Mime
View raw message