incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Davis" <paul.joseph.da...@gmail.com>
Subject Re: Integrated Full Text Indexing and Reporting Re: CouchDB 0.9 and 1.0
Date Sat, 12 Jul 2008 21:35:17 GMT
The patch for Issue74 only affects the line protocol between the
external processes. I think that the biggest show stopper to getting
full text searching right now is the fluidity of how CouchDB is going
to start interfacing with external software. Whether things move
towards having some sort of plugin interface etc should probably be
settled before doing too much work on this. (Assuming that most of the
FTI work will be involved in the integration step.)

Also the note on intersecting views with FTI search results is
interesting, but I'm not certain how that would work implementation
wise. I could see some pretty harsh run time characteristics come into
play when attempting to merge between indices that are in and out of
couchdb.

Not to say it wouldn't be a kick ass feature, but it almost seems like
something that wouldn't be feasible without an erlang FTI engine. In
other news, implementing intersections for arbitrary views might an
entirely separate feature to implement.

Paul

On Sat, Jul 12, 2008 at 5:24 PM, Jan Lehnardt <jan@apache.org> wrote:
>
> On Jul 11, 2008, at 22:29 , Damien Katz wrote:
>
>> CouchDB needs integrate full-text indexing support. We should be able to
>> support multiple full text engines, but our reference implementation will be
>> Apache Lucene.
>>
>> Initially (I'm hoping for 0.9.0)  we should be able to index all documents
>> and their attachments (for types that lucene can index anyway) and return
>> queries against that index via. Jan has begun this work and I think someone
>> has this mostly working now somewhere, but its not in trunk?
>
> we have a patch that improves the API here:
> https://issues.apache.org/jira/browse/COUCHDB-74
> and there is the
> http://svn.apache.org/repos/asf/incubator/couchdb/branches/lucene-search/
> branch that this patch should be applied to. Further work should be
> continued there. At this
> point the only difference between trunk and the branch is the addition of
> the /db/_search
> API call. The branch also might need to be brought up to trunk. It has no
> current maintainer,
> although Paul Davis voiced interest in pushing this forward. Also, there
> were attempts at adding
> other search engines but they never surfaced. If I remember correctly, the
> problem that views
> can not be searched without expanding the view server, stopped most work.
>
>
>> By 1.0, we should also do a view intersections with full text results. At
>> query time, CouchDB gets back a list of matching documents and then finds
>> the emited view rows from those documents,  and returns them sorted by
>> relevance score. This will require some enhancements to the internal view
>> API, but the data and required index (views keys by doc id) already exist to
>> make this efficient.
>
> I opened a bug report for this.
>
>
> --
>
> Since I started the work on Lucene I am by open source work definition
> somewhat responsible for the life of this. But I'd rather not, at least for
> the Java side of things. If somebody (heya Paul, still in?) wants to take
> this over, that'd be mighty cool.
>
>
> Cheers
> Jan
> --
>
>> Perhaps not initially, but eventually the integration of the fulltext
>> engine will be as proper couchdb HTTP and daemon plug-ins (once those apis
>> are established).
>>
>> On Jul 2, 2008, at 3:08 AM, Jan Lehnardt wrote:
>>
>>> Hello everybody,
>>> this thread is meant to collect missing work items (features and
>>> bugs) for for our 1.0 release and a discussion about how to split
>>> them up between 0.9 and 1.0.
>>>
>>> Take it away: Damien.
>>>
>>> Cheers
>>> Jan
>>> --
>>
>>
>
>

Mime
View raw message