couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Søren Hilmer ...@widetrail.dk>
Subject Re: Clarifications or bugs
Date Fri, 11 Apr 2008 06:31:25 GMT
Hi Nils,

I am currently rewriting the indexing and basing in on the prior
discussion here. In that the indexing is done on views not on documents
(your 2. suggestion). The views to index are listed in a special design
document (_design/fulltextsearch).

My problem is that couchdb only tells me which database has changed, with
that information I can get the documents changed, but now I need to run
the views to index on these documents (and preferably these documents
only) in order to get what I need to index. I believed I could filter on
the documentid (startkey_docid) but that is apparently only supported in
conjunction with startkey (which I do not know, as I haven't ran the view
on the document yet).

If you haven't followed the discussion on the fulltext design document you
can find a summary on the wiki:
http://wiki.apache.org/couchdb/FullTextSearch

Have fun
  Søren

On Fri, April 11, 2008 01:13, Nils Adermann wrote:
> Hi Søren,
>
> I'm not entirely sure I understand your problem but to me it looks like
> you assume that every view result is tied to exactly one document. This
> is not the case. Right now there can be multiple results from a single
> document and once we have reduce there can be view results that depend
> on any number of documents. That's why Jan's basic fulltext support on a
> database level cannot be used to retrieve a subset of a view without
> recomputing the entire view ad-hoc. There's only two ways I see right now:
>
> 1. Recompute the views based on the found documents
>  - Works for small result sets, otherwise it's probably too slow
>  - Requires the fulltext indexer to be able to index documents with any
> number of differing arbitrary structures containing any amount of text
> values
>
> 2. Index view results
>  - If not limited this would probably create too many and too big
> indexes, therefore we would need a view setting _fulltext_index that
> indicates that a view should be indexed. You would use this setting if
> your application plans to search results of that view.
> - In order to allow fulltext indexers which require a fixed structure
> for all documents to index a CouchDB view, you could go even further:
> Define a structure without data in the view specification that informs
> the indexer which format it can expect all view results will follow.
> This could also be used to indicate which resulting values really
> contain text that needs to be indexed at all.
>
> Cheers
> Nils
>
> Søren Hilmer wrote:
>>>> 2. startkey_docid does not seam to work, the first document in the
>>>> view is
>>>> always returned.
>>>>
>>> startkey_docid needs to be combined with startkey to work correctly. I
>>> don't think it's even applied when there's no startkey.
>>>
>>
>> Ahh, this is very unfortunate, say you know the document_id of a changed
>> document, but not necessary the view-key, then you have no way of
>> getting
>> what the view will return for that specific document.
>>
>> This is the situation for the indexer, CouchDB will notify it with which
>> DB has changed, the indexer knows the previous update-sequence and gets
>> all documents newer, but it need to index the views specified for
>> indexing, and thus run the view for the changed documents only, but as
>> it
>> has not got the view-key in this situation, it is out of luck.
>>
>> The wiki for HttpViewApi says "For efficient paging use startkey and/or
>> startkey_docid."
>>
>> Are you sure this does not classify as a Bug? Is there something I am
>> missing.
>>
>> Have fun
>>   Søren
>>
>>
>>> Cheers,
>>> --
>>> Christopher Lenz
>>>    cmlenz at gmx.de
>>>    http://www.cmlenz.net/
>>>
>>>
>>>
>>
>>
>
>


-- 
Søren Hilmer, M.Sc., M.Crypt.
wideTrail            Phone: +45 25481225
Pilevænget 41        Email: sh@widetrail.dk
DK-8961  Allingåbro  Web: www.widetrail.dk




Mime
View raw message