incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jarrod Roberson <>
Subject Re: Any better solution for my case?
Date Mon, 17 May 2010 00:58:54 GMT
> > How so I have tested it with 100k's of documents and didnt see any
> > performance problems
> With this approach operation turns into:
> 1. GET criteria document,
> 2. Pass it to the list function,
> 3. List function will filter each key by applying rules, sound like
> temp-view for each request
> Instead of just querying a view.
> Simply I don't need "ad hoc" query, this is not my case. I need "ad hoc"
> views. All criterions are known at the moment of indexation.
> It's much easier to just generate code for map function, but not so
> elegant.

you don't understand my approach, the list function doesn't apply any rules
it just merges the duplicate documents, it is exactly the same thing that a
RDBMS or say Lucene would do.

your View function would look something like this:
I don't know what pages_count might need to be queried on so I skipped it,
exercise for the reader :-) I am sure you could apply the same logic to
pages_count that I applied to tags.

    for (var t in doc.tags)
        emit(['tag',doc.tags[t]], null);

of course you could do doctype checking if you have multiple types of
documents and checking to see if each property actually exists but I didn't
want to muddy the water with boilerplate code.
then you use the generic merge list List function I wrote that reduces all
the results down to one unique set of ids/documents.

something like this

curl -X POST

the one assumption I make is that I want the docs back on the queries, it
would be easy enough to change the list function to optionally process the
docs if they don't exist and only return back a unique set of keys.
This is a generic way to search without having to have custom views for each
field you want to search.
Since you have a single user per database, you could just create permanent
views on demand, it would take time to build the indexes based on how big
each database was and it could bload the database with un-neccesarry
duplication, but this generic fashion avoids having all that duplicate data
where they just have say a single tag difference in the "criteria".

you could even make the index marginally smaller by just using single
character names for the field names if there were "lots" of documents to
index t instead of tags, c1 instead of color, c2 instead of condition.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message