incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brad King" <brk...@gmail.com>
Subject Re: view index build time
Date Wed, 02 Jul 2008 22:00:43 GMT
I've got R12B. We've also got the couchdb 0.8.0-incubating version.
I'm just curious what my expectations should be for view creation
times. Also was wondering if anyone had tried putting the design
folder on different disk to improve I/O.

On Wed, Jul 2, 2008 at 2:18 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> One thing that got me awhile back was the version of erlang I was
> using. If you're not on one of the most recent erlang versions R12B or
> some such, you might try upgrading that bit to see if it fixes things.
>
> Paul
>
> On Wed, Jul 2, 2008 at 1:58 PM, Brad King <brking@gmail.com> wrote:
>> I created a view with emit(doc.entityobject.sku, null) to only emit
>> the doc ids. After trying attachments, I nuked the DB  and started
>> over, going back to having the documents inline. This is ok, but
>> again, the index build time of about 25 minutes for this view against
>> 300K or so docs seems long. What are you seeing as typical for
>> creating your views against a much larger set? What do your docs look
>> like? Thanks.
>>
>>
>> On Wed, Jul 2, 2008 at 10:50 AM, Jan Lehnardt <jan@apache.org> wrote:
>>>
>>> On Jul 2, 2008, at 16:17, Brad King wrote:
>>>
>>>> Just to post some results here of working with around 300K docs. I
>>>> changed the view to emit only the doc ID and index time went down to
>>>> about 25 minutes vs. an hour for the same dataset.
>>>>
>>>> I then converted the largest text field to an attachment and things
>>>> went down hill from there. I deleted the db and started the upload,
>>>> but repeatedly got random 500 server errors with no real way to know
>>>> what is happening or why. Also the DB size as reported by Futon seemed
>>>> to fluctuate wildly as I was adding documents. And I mean wildly like
>>>> anywhere from 1.2G then back down to 144M. Weird. I don't get a very
>>>> warm fuzzy feeling about the stability of using attachments right now.
>>>> Ideally, I don't want to use them anyway, I'd prefer to have the
>>>> fields all inline and have the database handle these docs as-is. I
>>>> don't see these as huge documents (2 to 5K) as compared to what I
>>>> would store in something like Berkeley DB XML, just for comparison
>>>> sake, so I'm hoping its a goal of the project to handle these
>>>> effectively, even when several million documents are added.
>>>
>>> This doesn't sound right at all. Can you make sure you use the
>>> very latest SVN version or the 0.8 release and completely
>>> new databases? Also, just to clarify, do you emit the doc into
>>> the view payload? As in emit(doc._id, doc); are you just doing
>>> emit(null, null); to only get the docIds that matter to you and
>>> then fetch the documents later? I have had the later setup running
>>> without any problems across ~2mio documents in a database.
>>>
>>>
>>>> As always, thanks for the help.
>>>
>>> Thanks for the problem report.
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Jul 1, 2008 at 9:26 AM, Brad King <brking@gmail.com> wrote:
>>>>>
>>>>> Thanks for the tips. I'll start scaling back the data I'm returning
>>>>> and see if it improves. The largest field is an html description of an
>>>>> inventory item, which seems like a good candidate for a binary
>>>>> attachment, but I need to be able to do full text searches on this
>>>>> data eventually (hopefully with the Lucene integration) so I'll
>>>>> probably try just not including the document data in the views first.
>>>>> We've had some success with Lucene independent of couchdb, so I'm
>>>>> pleased you guys are integrating this.
>>>>>
>>>>> On Sat, Jun 21, 2008 at 8:39 AM, Damien Katz <damienkatz@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Part of the problem is you are storing copies of the documents into
the
>>>>>> btree. If the documents are big, it takes longer to compute on them,
and
>>>>>> if
>>>>>> the results (emit(...)) are big or numerous, then you'll be spending
>>>>>> most of
>>>>>> your time in I/O.
>>>>>>
>>>>>> My advice is to not emit the document into the view, and if you can,
get
>>>>>> the
>>>>>> documents smaller in general. If the data can stored as an binary
>>>>>> attachment, then that too will give you a performance improvement.
>>>>>>
>>>>>> -Damien
>>>>>>
>>>>>> On Jun 20, 2008, at 4:51 PM, Brad King wrote:
>>>>>>
>>>>>>> Thanks, yes its currently at 357M and growing!
>>>>>>>
>>>>>>> On Fri, Jun 20, 2008 at 4:49 PM, Chris Anderson <jchris@grabb.it>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Brad,
>>>>>>>>
>>>>>>>> You can look at
>>>>>>>>
>>>>>>>> ls -lha /usr/local/var/lib/couchdb/.my-dbname_design/
>>>>>>>>
>>>>>>>> to see the view size growing...
>>>>>>>>
>>>>>>>> It won't tell you when it's done but it will give you hope
that the
>>>>>>>> progress is happening.
>>>>>>>>
>>>>>>>> Chris
>>>>>>>>
>>>>>>>> On Fri, Jun 20, 2008 at 1:45 PM, Brad King <brking@gmail.com>
wrote:
>>>>>>>>>
>>>>>>>>> I have about 350K documents in a database. typically
around 5K each.
>>>>>>>>> I
>>>>>>>>> created and saved a view which simply looks at one field
in the
>>>>>>>>> document. I called the view for the first time with a
key that should
>>>>>>>>> only match one document, and its been awaiting a response
for about
>>>>>>>>> 45
>>>>>>>>> minutes now.
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>> "sku": {
>>>>>>>>>   "map": "function(doc) { emit(doc.entityobject.SKU,
doc); }"
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> Is this typical, or is there some optimizing to be done
on either my
>>>>>>>>> view or the server? I'm also running on a VM so this
may have some
>>>>>>>>> effects, but smaller databases seem to be performing
pretty well.
>>>>>>>>> Insert times to set this up were actually really good
I thought, at
>>>>>>>>> 4000 to 5000 documents per minute running from my laptop.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Chris Anderson
>>>>>>>> http://jchris.mfdz.com
>>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>

Mime
View raw message