lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ahlschl├Ąger" <>
Subject Re: Need some Advice on Searching
Date Mon, 22 May 2006 06:58:12 GMT
On 19/05/06, Chris Hostetter <> wrote:
> i assume when you say this...
> : 1. I need to temporarilly index sets of documents on the Fly say 100 at
> a
> : Time.
> you mean that you'll have lots of temporary indexes of a few hundrad
> documents and then you'll do a bunch of queries and throw the index away.
> Even if i'm wrong most of the rest of my advice will wtill be usefull, but
> its' good to clarify.

Correct I will throw them away!

: My problem is that for these queries I need to know which Documents hit. I
> : also need to know which terms hit and if possible
> : the location of the hits for each term in the hit Document.
> knowing which docs match your is easy.  knowing where in a document a
> particular term matches can be done using the TermPositions APIs ... but
> it does you that info as a number of "terms" which for HTML content may be
> confusing depending on how your analyzer deals with that HTML.

Okay based on your answer and a little testing just to see what it gives me
- I assume
Lucene only stores the Term Offset (which is Analyser Dependent) and not the
Actual Offset as retrieved from the Plain Text Stream for the Term.

if you have complex boolean queries and you need to know which individual
> pat of the query matched that's not really trivial.  you didn't mention
> anything about "score" or "relevancy" in your email, so i'm guessing all
> you care about is boolean "did it match or not" logic .. in that case
> using Filters directly (without ever searching) is your friend.  You can
> build a Filter for each individual clause, intersect/union the bitsets to
> get the final set of matching documents for your whole query, but
> inspect the individual bitsets to know he specifics about which ones match
> which documents.

Score/Relavence is not Important. I need the Yes/No logic with the what
caused the Match Info. Could you mayby explain the intersect/union the
bitsets and the interogating to know
what matched?

some people don't like Filters because of how much space they take up for
> really large indexes, but if you've only got 100 docs ... there's no
> reason not to use them

Nope will never have any really large Indexes here 100 to 200 docs at the

> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> Thanx for the Relpy much appreciated.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message