incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Gonzalez <gonva...@gonvaled.com>
Subject Re: Creating a database with lots of documents and updating a view
Date Thu, 15 Mar 2012 13:21:56 GMT
Hi CGS,

As you can see from the other threads that I have opened, the problem seems
to be threefold:

   1. The bigger the database gets, the more size couchdb needs *per
   document*
   2. The bigger the database gets, the more time it takes to insert new
   documents
   3. The bigger the database gets, the longer it gets to generate the
   views. The dependency is not lineal.

We are talking here about a database which will be created once (around 20
million documents in all), and will be updated regularly, to the tune of
500 thousand documents per day, mostly additions and edits (very few
deletes, if at all).

How are people structuring big data with couchdb?

Thanks,
Daniel

On Thu, Mar 15, 2012 at 12:13 PM, CGS <cgsmcmlxxv@gmail.com> wrote:

> I understand your reluctance and I agree with you, but it seems that the
> bottleneck is at building the view and to that I see no other way to speed
> up the things but only trying to build it in parallel. That's why I gave
> that option. Of course, if you use more computing elements it will build
> faster. But now a question: do you need to build such a view quite often?
> Meaning, do you expect further a high rate of document insertion or just
> once and after that the rate is relatively low? Because if it is one time
> shot, then it may be worth waiting once to have the index built. Otherwise,
> if you expect high frequency insertion, a round robin (or more evolved load
> distributor algorithm) may be a nice choice to give each database the
> required time to make the insertion and update the view. I know that means
> more work for you, but it also builds a more robust application.
>
> This option is just for the sake of discussing different solutions to your
> problem.
>
> Regards,
> CGS
>
>
>
> On Thu, Mar 15, 2012 at 10:48 AM, Daniel Gonzalez <gonvaled@gonvaled.com
> >wrote:
>
> > Hi Cgs,
> >
> > I suppose I could come up with a partition strategy that distributes
> > the load: I could do what you suggest, or I could also use separate
> > hosts. But all this would transfer complexity to my application, and I
> > want to avoid that.
> >
> > Thanks
> > Daniel
> >
> > On Thu, Mar 15, 2012 at 12:41 AM, CGS <cgsmcmlxxv@gmail.com> wrote:
> > > Hi,
> > >
> > > Sorry for interfering, Daniel, but do you really need all those
> documents
> > > in a single database? I mean is it mandatory to have them there?
> Because
> > if
> > > not, one can split that 3M database in 10 - 100 smaller databases, each
> > > with its own view, and concatenate the result after that (if you cannot
> > use
> > > pagination, otherwise, instruct your view to take only a certain number
> > of
> > > results per displayed page) using JS or whatever else if you have an
> > > application which needs that result. In this way you can have a certain
> > > level of parallelism which may speed the overall process.
> > >
> > > Just a 2c idea.
> > >
> > > CGS
> > >
> > >
> > >
> > >
> > >
> > > On Thu, Mar 15, 2012 at 12:13 AM, Matthieu Rakotojaona <
> > > matthieu.rakotojaona@gmail.com> wrote:
> > >
> > >> On Wed, Mar 14, 2012 at 10:19 PM, Christopher Sebastian
> > >> <csebastian3@gmail.com> wrote:
> > >> > I am also new to couchdb, but I don't believe the information from
> > >> Matthieu
> > >> > Rakotojaona is correct.  It is my understanding that pretty much
> > >> everything
> > >> > in CouchDB (including views) uses incremental updates.  So adding
> new
> > >> > documents to the database does NOT cause all view leaves to be
> > traversed
> > >> --
> > >> > the view is updated incrementally.
> > >> >
> > >> > Is this correct?
> > >> >
> > >> > ~Christopher Sebastian
> > >>
> > >> I might have spoken a little bit too fast. Indeed when you add new
> > >> documents, they are automatically passed (or rather they will be when
> > >> the next query comes) to the map function, which doesn't traverse the
> > >> already indexed db.
> > >>
> > >> But when (and if) you have a reduce function, all the intermediary
> > >> rereduce results will have to be updated, right ?
> > >>
> > >> --
> > >> Matthieu RAKOTOJAONA
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message