incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tracy Flynn <couc...@thisonejustforme.com>
Subject Volume Test - 2 million documents
Date Wed, 13 Oct 2010 00:16:13 GMT

Thanks for all the previous help.

For both parts, documents contain about 30 fields of metadata and  the primary content of
about 5K to 10K.

The desire is to prove out the feasibility of moving all our syndication services to a common
platform that provides rapid customization for customer-specific syndication feeds.

Part 1
--------

I've already done a successful proof-of-concept with 100K documents.

No optimization. 

A couple of things I noticed.

Environment my laptop (a recent, loaded MacBook Pro - 2.93 Intel Core 2 Duo, 8 GB memory)

 - 100K docs load took about 1 hour
- Creating a single view with 'emit([single key],doc]) took about 1 hour
- The log indicated view checkpoints every 30 sequence numbers or so.

Part 2
-------

I'm about to do a volume test of about 2 million documents - .

Primary load
----------------

I will be running in batches of about 1000 documents.

Three separate unix servers on a local network:

- One for couchdb instance
- One for feeder process
- One for database

View definition
------------------

I have two views defined, without any reduce functions.

Questions for Part 2
-------------------------

Firstly any thoughts or hints on my larger benchmark (Part 2) ?

Is it naive to hope to speed up the first creation of the view by using map functions of the
form 'emit([key],null)' and then using 'include_docs' on queries?

Is there any way to control the checkpointing of views when creating the view for the first
time - I'm guessing I'm looking at many hours to create a single view on 2 million documents.


Any help would be appreciated.

Regards,

Tracy


Mime
View raw message