Hi,
If your documents contain many "rows" it's probably better to have each "row" as a separate
document and collate with views. If you use attachments you can't (currently) build an index
on the data in the attachment IIRC. You'll want to test with a subset of the data and get
some reasonable expectation on how it'll behave as the data grows before making a final decision.
Cheers
Simon
On Monday, 13 February 2012 at 16:03, mike iannacone wrote:
> Thanks for the response. Looking around a bit more, it does seem like
> our documents are larger than most people are using. Is there any
> general guideline or rule of thumb as to how large documents should
> be?
>
> For some background, this is full of public health metrics and related
> data, which we're compiling from several different sources. Each
> document basically corresponds to one metric from one source. Many of
> these were imported from csv files, so mapping one csv file to one
> document made some sense for us. The documents each contain various
> metadata (the source, the years, possibly some statistical info, etc),
> and then a list of individual data objects. It might make sense to
> split this up, so that each document only contains the metadata, and
> has an attachment with the actual data. Does that sound like a good
> approach, or am I on the wrong track with that?
>
> Mike
>
> On Mon, Feb 13, 2012 at 9:36 AM, Steve Foulkes <sfoulkes@fnal.gov (mailto:sfoulkes@fnal.gov)>
wrote:
> > Hi,
> >
> >
> > On 2/10/12 8:58 PM, mike iannacone wrote:
> > >
> > > Hi, I've been running into some rather strange errors when running my
> > > view code in certain cases. It seems to run fine until the size of
> > > the database grows beyond a certain point, at which point I get
> > > timeouts. The confusing part is that this size where it starts
> > > failing is quite low, around 1773 documents, totaling 402MB.
> > >
> > > environment:
> > > This is my development server, running couchDB 1.1.1, built using the
> > > build-couchdb tool as the wiki recommended, on a completely new Ubuntu
> > > install. (I reinstalled it a few hours ago, thinking it might be some
> > > kind of environment problem.)
> > >
> > > overall process shown in the logs:
> > >
> > > *load a subset of documents, and confirm the views work
> > >
> > > *load most of the remaining documents, views work
> > > (this was done from the futon client, running on another machine.
> > > It sees the connection time out, but view index builds ok anyway, and
> > > completes a few minutes after the client has given up. When the
> > > client requests the view afterwards, it works fine, and fast now that
> > > the index is done.)
> > >
> > > *upload another 18 documents, (the largest ones, ranging from 10M to
> > > 22M,) view failed with "OS Process timed out."
> > > The log of everything described up to this point is included.
> > >
> >
> >
> >
> > The large documents are the problem. The view process is taking too long to
> > process them and is timing out. You can increase the timeout in the
> > configuration which is accessible from futon, it's under "couchdb" and
> > called "os_process_timeout".
> >
> > Steve
> >
> > >
> > > This seems strange as it gave this error only now, when it took so
> > > long previously. At any rate, I increased the os_process_timeout
> > > value to 10 minutes, and attempted it again, and it still timed out
> > > after only a few seconds. (this is shown in the second log file,
> > > although it is essentially the same as the first.)
> > >
> > >
> > > the actual view functions are shown in the log, but for convenience they
> > > are:
> > > "indicator_summary": {
> > > "map": "function(doc) {\n if(doc.Data){\n var temp =
> > > {};\n temp.Name = doc.Name;\n temp.Description =
> > > doc.Description;\n temp.Sources = doc.Sources;\n temp.SourceURL
> > > = doc.SourceURL;\n temp.Years = doc.Years;\n temp.National =
> > > doc.National;\n temp.LocaleLevels = doc.LocaleLevels;\n
> > > temp.Demographics = doc.Demographics;\n temp.Unit = doc.Unit;\n
> > > temp.UnitLabel = doc.UnitLabel;\n temp.DataType = doc.DataType;\n
> > > temp.Category = doc.Category;\n temp.TopCorrelated =
> > > doc.TopCorrelated;\n emit(doc.Name, temp);\n }\n}"
> > > },
> > > "indicator_detail": {
> > > "map": "function(doc) {\n if(doc.Data&& doc.Years){\n
> > >
> > > for(var i=0; i<doc.Years.length; i++){\n for(var j=0;
> > > j<doc.LocaleLevels.length; j++){\n var temp = {};\n
> > > temp.Name = doc.Name;\n temp.Description = doc.Description;\n
> > > temp.Sources = doc.Sources;\n temp.SourceURL =
> > > doc.SourceURL;\n /*for(var k=0; k<doc.National.length; k++){\n
> > > if(doc.National[k][doc.Years[i]]){\n temp.National
> > > = doc.National[k][doc.Years[i]];\n }\n }*/\n
> > > temp.Demographics = doc.Demographics;\n temp.Unit = doc.Unit;\n
> > > temp.UnitLabel = doc.UnitLabel;\n temp.DataType =
> > > doc.DataType;\n temp.Category = doc.Category;\n
> > > temp.Data = doc.Data;\n temp.TopCorrelated =
> > > doc.TopCorrelated;\n emit([doc.Name, doc.Years[i]], temp);\n
> > > }\n }\n }\n}"
> > > }
> > >
> > >
> > > Besides this, I've tried replicated to a second machine, and on that
> > > one adjusting several values, with no real progress: increased erlang
> > > heartbeat timeout, increased erlang heap size, increased spidermonkey
> > > stack size. These all either made no difference, or caused other
> > > errors. I admit I was kind of guessing when changing those, so its
> > > entirely possible that I was completely on the wrong track with those.
> > > At any rate, the logs I included (and the current state of that dev
> > > machine) is with everything set to its default values, except for that
> > > 10 minute os_process_timeout value I mentioned above.
> > >
> > > Any help would be fantastic, as I'm completely out of ideas at this
> > > point. I'd of course be glad to provide any additional info that
> > > might be useful to you.
> > >
> > > Thanks!
> > > Mike
> > >
> >
> >
>
>
>
|