incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ginter" <jonathan.gin...@coradiant.com>
Subject RE: Largest CouchDB dbs?
Date Mon, 03 Nov 2008 14:42:56 GMT
My apologies for the overloaded use of the term "incubation".  I realize
it has a special meaning for Apache projects.  My bad.

Thanks for all of quick responses.  It's a sign of a well-run project.
I will keep my eye on the progress of CouchDB.  Hopefully, it will
rapidly reach the scalability point that I am looking for.

Jonathan

-----Original Message-----
From: Jan Lehnardt [mailto:jan@apache.org] 
Sent: Monday, November 03, 2008 8:50 AM
To: couchdb-user@incubator.apache.org
Subject: Re: Largest CouchDB dbs?


On Nov 3, 2008, at 14:40, Jonathan Ginter wrote:

> From what I have read, it sounds like the project is not yet ready to
> scale this large, but there are plans in place to do so (faster view
> parsers, partitioning, etc).  Is there a rough target for this  
> work?  We
> have a roadmap for upcoming projects and I need to know whether  
> CouchDB
> can be considered for the short term (i.e., within the next 4 - 6
> months) or whether we will have to give it more time to incubate and
> come back to it later on in the longer term.

No ETA. but feel free to sponsor development :) The two biggest boosts  
for
view generation are (as you correctly identified) JSON serialisation  
on the
Erlang-end and actually making use of MapReduce's parallel nature. At  
the
moment, view creation is single-threaded and limited to a single core  
on your
system.

Just to avoid potential misunderstanding: Incubation is the process of
becoming an Apache project. It has nothing to do with the software
development roadmap.

Cheers
Jan
--


>
>
> Jonathan
>
> -----Original Message-----
> From: Damien Katz [mailto:damien@apache.org]
> Sent: Monday, November 03, 2008 6:00 AM
> To: couchdb-user@incubator.apache.org
> Subject: Re: Largest CouchDB dbs?
>
>
> On Nov 3, 2008, at 4:38 AM, Jan Lehnardt wrote:
>
>>
>> On Nov 3, 2008, at 05:53, Jonathan Ginter wrote:
>>
>>> I have a similar issue.  I am interested in using CouchDB to host a
>>> 200+ GB database that will receive well over 200 million documents
>>> per day.  Moreover, the data must roll out - i.e., constant
>>> background purging - and also support UI queries.  And this is just
>>> a starting point to match the abilities of the relational database
>>> we are already running.  I will want the DB to scale up from there.
>>>
>>> If there is no hope of the CouchDB being able to handle all of that
>>> - regardless of how many machines we deploy - I would like to know
>>> that now before I look any further into this project.
>>
>>> Does anyone have a reasonable idea about whether CouchDB will be
>>> capable of such massive scalability or how many machines it would
>>> take to scale that large?
>>
>> This sounds like a scenario that CouchDB will ultimately be able to
>> handle nicely. I don't think we can give out any guarantees about  
>> when
>> an how this will be the case. Maintaining a 200+GB data set would
>> require
>> quite some hand-wiring at the moment.
>>
>>
>>> I would appreciate any feedback that anyone might have on this.
>>
>> I think Damien can chime in here :) Damien?
>>
>
> This is definitely well within what couchdb should be able to do once
> partitioning is in place. I'm not really working on this yet, but
> there are a lot of people and companies interested in seeing the
> partitioning work done. So maybe some progress will be made soon.
>
> -Damien
>


Mime
View raw message