lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Need some Advice on Searching
Date Fri, 19 May 2006 21:12:32 GMT

i assume when you say this...

: 1. I need to temporarilly index sets of documents on the Fly say 100 at a
: Time.

you mean that you'll have lots of temporary indexes of a few hundrad
documents and then you'll do a bunch of queries and throw the index away.
Even if i'm wrong most of the rest of my advice will wtill be usefull, but
its' good to clarify.

: My problem is that for these queries I need to know which Documents hit. I
: also need to know which terms hit and if possible
: the location of the hits for each term in the hit Document.

knowing which docs match your is easy.  knowing where in a document a
particular term matches can be done using the TermPositions APIs ... but
it does you that info as a number of "terms" which for HTML content may be
confusing depending on how your analyzer deals with that HTML.

if you have complex boolean queries and you need to know which individual
pat of the query matched that's not really trivial.  you didn't mention
anything about "score" or "relevancy" in your email, so i'm guessing all
you care about is boolean "did it match or not" logic .. in that case
using Filters directly (without ever searching) is your friend.  You can
build a Filter for each individual clause, intersect/union the bitsets to
get the final set of matching documents for your whole query, but
inspect the individual bitsets to know he specifics about which ones match
which documents.

some people don't like Filters because of how much space they take up for
really large indexes, but if you've only got 100 docs ... there's no
reason not to use them


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message