incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: view index build time
Date Wed, 02 Jul 2008 22:08:57 GMT
I'd have to go back and double check, but off the top of my head 25
min for 300K docs seems about like what I was getting. Ie, not orders
of magnitude slower or anything.

Not sure about moving the design folder to a different disk, you may
check iostat while indexing, although I think I saw either on this
list or in IRC someone reporting that the erlang->javascript and
javascript->erlang translations were what was slowing everything down.
Although I could've made that conversation up in a dream.

HTH,
Paul

On Wed, Jul 2, 2008 at 6:00 PM, Brad King <brking@gmail.com> wrote:
> I've got R12B. We've also got the couchdb 0.8.0-incubating version.
> I'm just curious what my expectations should be for view creation
> times. Also was wondering if anyone had tried putting the design
> folder on different disk to improve I/O.
>
> On Wed, Jul 2, 2008 at 2:18 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>> One thing that got me awhile back was the version of erlang I was
>> using. If you're not on one of the most recent erlang versions R12B or
>> some such, you might try upgrading that bit to see if it fixes things.
>>
>> Paul
>>
>> On Wed, Jul 2, 2008 at 1:58 PM, Brad King <brking@gmail.com> wrote:
>>> I created a view with emit(doc.entityobject.sku, null) to only emit
>>> the doc ids. After trying attachments, I nuked the DB  and started
>>> over, going back to having the documents inline. This is ok, but
>>> again, the index build time of about 25 minutes for this view against
>>> 300K or so docs seems long. What are you seeing as typical for
>>> creating your views against a much larger set? What do your docs look
>>> like? Thanks.
>>>
>>>
>>> On Wed, Jul 2, 2008 at 10:50 AM, Jan Lehnardt <jan@apache.org> wrote:
>>>>
>>>> On Jul 2, 2008, at 16:17, Brad King wrote:
>>>>
>>>>> Just to post some results here of working with around 300K docs. I
>>>>> changed the view to emit only the doc ID and index time went down to
>>>>> about 25 minutes vs. an hour for the same dataset.
>>>>>
>>>>> I then converted the largest text field to an attachment and things
>>>>> went down hill from there. I deleted the db and started the upload,
>>>>> but repeatedly got random 500 server errors with no real way to know
>>>>> what is happening or why. Also the DB size as reported by Futon seemed
>>>>> to fluctuate wildly as I was adding documents. And I mean wildly like
>>>>> anywhere from 1.2G then back down to 144M. Weird. I don't get a very
>>>>> warm fuzzy feeling about the stability of using attachments right now.
>>>>> Ideally, I don't want to use them anyway, I'd prefer to have the
>>>>> fields all inline and have the database handle these docs as-is. I
>>>>> don't see these as huge documents (2 to 5K) as compared to what I
>>>>> would store in something like Berkeley DB XML, just for comparison
>>>>> sake, so I'm hoping its a goal of the project to handle these
>>>>> effectively, even when several million documents are added.
>>>>
>>>> This doesn't sound right at all. Can you make sure you use the
>>>> very latest SVN version or the 0.8 release and completely
>>>> new databases? Also, just to clarify, do you emit the doc into
>>>> the view payload? As in emit(doc._id, doc); are you just doing
>>>> emit(null, null); to only get the docIds that matter to you and
>>>> then fetch the documents later? I have had the later setup running
>>>> without any problems across ~2mio documents in a database.
>>>>
>>>>
>>>>> As always, thanks for the help.
>>>>
>>>> Thanks for the problem report.
>>>>
>>>> Cheers
>>>> Jan
>>>> --
>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 1, 2008 at 9:26 AM, Brad King <brking@gmail.com> wrote:
>>>>>>
>>>>>> Thanks for the tips. I'll start scaling back the data I'm returning
>>>>>> and see if it improves. The largest field is an html description
of an
>>>>>> inventory item, which seems like a good candidate for a binary
>>>>>> attachment, but I need to be able to do full text searches on this
>>>>>> data eventually (hopefully with the Lucene integration) so I'll
>>>>>> probably try just not including the document data in the views first.
>>>>>> We've had some success with Lucene independent of couchdb, so I'm
>>>>>> pleased you guys are integrating this.
>>>>>>
>>>>>> On Sat, Jun 21, 2008 at 8:39 AM, Damien Katz <damienkatz@gmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Part of the problem is you are storing copies of the documents
into the
>>>>>>> btree. If the documents are big, it takes longer to compute on
them, and
>>>>>>> if
>>>>>>> the results (emit(...)) are big or numerous, then you'll be spending
>>>>>>> most of
>>>>>>> your time in I/O.
>>>>>>>
>>>>>>> My advice is to not emit the document into the view, and if you
can, get
>>>>>>> the
>>>>>>> documents smaller in general. If the data can stored as an binary
>>>>>>> attachment, then that too will give you a performance improvement.
>>>>>>>
>>>>>>> -Damien
>>>>>>>
>>>>>>> On Jun 20, 2008, at 4:51 PM, Brad King wrote:
>>>>>>>
>>>>>>>> Thanks, yes its currently at 357M and growing!
>>>>>>>>
>>>>>>>> On Fri, Jun 20, 2008 at 4:49 PM, Chris Anderson <jchris@grabb.it>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Brad,
>>>>>>>>>
>>>>>>>>> You can look at
>>>>>>>>>
>>>>>>>>> ls -lha /usr/local/var/lib/couchdb/.my-dbname_design/
>>>>>>>>>
>>>>>>>>> to see the view size growing...
>>>>>>>>>
>>>>>>>>> It won't tell you when it's done but it will give you
hope that the
>>>>>>>>> progress is happening.
>>>>>>>>>
>>>>>>>>> Chris
>>>>>>>>>
>>>>>>>>> On Fri, Jun 20, 2008 at 1:45 PM, Brad King <brking@gmail.com>
wrote:
>>>>>>>>>>
>>>>>>>>>> I have about 350K documents in a database. typically
around 5K each.
>>>>>>>>>> I
>>>>>>>>>> created and saved a view which simply looks at one
field in the
>>>>>>>>>> document. I called the view for the first time with
a key that should
>>>>>>>>>> only match one document, and its been awaiting a
response for about
>>>>>>>>>> 45
>>>>>>>>>> minutes now.
>>>>>>>>>>
>>>>>>>>>> {
>>>>>>>>>> "sku": {
>>>>>>>>>>   "map": "function(doc) { emit(doc.entityobject.SKU,
doc); }"
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Is this typical, or is there some optimizing to be
done on either my
>>>>>>>>>> view or the server? I'm also running on a VM so this
may have some
>>>>>>>>>> effects, but smaller databases seem to be performing
pretty well.
>>>>>>>>>> Insert times to set this up were actually really
good I thought, at
>>>>>>>>>> 4000 to 5000 documents per minute running from my
laptop.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Chris Anderson
>>>>>>>>> http://jchris.mfdz.com
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message