lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Scheffler" <thomas.scheff...@uni-jena.de>
Subject Re: merged search of document
Date Wed, 14 Jan 2004 16:59:09 GMT

Thomas Scheffler sagte:
> Hi,
>
> I need a tip for implementation. I have several documents all of them with
> a field named DocID. DocID identifies not a single Lucene Document but a
> collection of them. When I wan't to start a seach it should handle the
> search in that way, as these lucene documents where one.
>
> example:
>
> Document 1: DocID:XYZ
>
> containing: foo
>
> Document 2: DocID:XYZ
>
> containing: bar
>
> Document 3: DocID:ABC
>
> containing: foo bar
>
> Document 4: GHJ
>
> containing: foo
>
> As you already guesses, when I'm searching for "+foo +bar" I wan't the
> hits to contain Document 1, Document 2 and Document 3, not Document 4. Is
> that clear what I want? How do I implement such a monster? Is that
> possible with lucene? The content is not stored within lucene it's just
> tokenized and indexed.

OK

if that ever needed to be answered again, I post here my solution. It's
not quite optimal yet but it does function somehow. In my conrete
implementation the "DocID" is called "DerivateID".

First of all I get all DerivateIDs out of the Index and perform the search
on every (that's the non optimal point) DerivateID. For a search like (foo
bar) one query is:

(+DerivateID:MyCoReDemoDC_derivate_0014 +foo)
(+DerivateID:MyCoReDemoDC_derivate_0014 +bar)

The in this case two "OR" queries are split up and for every such subquery
the hits.length() must be greater than 0. If it so, than
MyCoReDemoDC_derivate_0014 is a hit for the search (foo bar). For (foo
-bar) it was a bit more difficult, as you have to make sure, that all
documents with the same DerivateID don't contain "bar". So another query
is started and if both hits, the original and rechecking one, are from the
same length than the query delivers a result.

As I mentioned it before it's not quite optimal yet but it allows you to
search for something like (foo bar),("foo bar") and (foo -bar) and that
was all I need.

If you want to look inside the code go on our cvs system:

http://www.mycore.de/repository/cgi-bin/viewcvs.cgi/mycore/sources/org/mycore/backend/lucene/

I'm thanking all of you for your great help.

Thomas Scheffler

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message