couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Chesneau <bchesn...@gmail.com>
Subject Re: mem3 and forced db fragmentation?
Date Tue, 04 Mar 2014 05:45:03 GMT
On Tue, Mar 4, 2014 at 2:43 AM, Adam Kocoloski <kocolosk@apache.org> wrote:
> Hi Benoit, it's important to be precise.  I tried to lock down what you meant by a non-fragmented
database but I'm afraid we're not there yet.  In this reply you say
>
>> A fragmented or sharded database is the clustered one
>
> which I'd read as saying that as a Q=1,N=1 database still counts as a fragmented or sharded
database, but I don't think that's what you meant.  I'm honestly not sure which specific databases
you want to move out of shards/ and into the top-level.

Well some people won't use at all the cluster facility. Lot of
individuals/companies don't use much data and will instead backup a
database on a daily/weekly basis on a separate disk. They will use
couchdb for other features (the replication). In that case - this is
what some do right now -, they just take the .couch file named with
the database name and move it to a disk. Some are also using the
append-only feature of couch to use the db file as a way to share the
data offline. In these cases storing a db  with its name is
interesting for practical reasons (humans don't handle hexadecimal
numbers easily). Right now all shards  exist in the same namespace and
 can only be found on the file system by looking in the shards db. I
think that what I really want is to be able to either still have the
db at the main level or to store shards of a db in their namespace
lilke <dbname>/{shard0...N} . Looking at

https://github.com/cloudant/mem3/blob/master/src/mem3_util.erl#L50

this is not what you do right now. And I don't see any reason for it
in fact since the data are isolated by dbs anyway. But maybe it's
needed for another cluster feature? Storing each db name in its
namespace, is probably the only thing that could be done anyway. Since
sharding is the only way right now to also add write concurrency and
calculate the views in parallel.


>
>> Also I am wondering right now how the transition from a the current
>> couchdb to the new one can be done. It would be quite easier imo to
>> have 1 HTTP layer and flag the  "special databases" to not expose them
>> in the HTTP API except for the admins.
>
> Now that's an interesting discussion.  We've mulled it over internally at Cloudant a
bit in the past but never really came to a conclusion.  I don't find the :5984 / :5986 split
elegant, but I'll also say that getting the API for local vs. clustered databases exactly
right is not at the top of my priority list.  In fact I think it may be the sort of thing
that we could defer until after the first release that includes clustering capabilities.
>

Understood.

- benoit

Mime
View raw message