lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Counting hits in a document
Date Thu, 18 Jan 2007 22:00:04 GMT

The Spans interface has a skipTo for jumping to a specific documentId (or
the first matching document with a higher documentId)
once you've done that, then the doc(), start(), and end() calls will tell
you info about the match (which doc it's in, where that match starts, nd
where it ends) ... use next() to advance tothe next match -- if doc()
returns the same number as before, then you've got two matches in one
document, if doc returns a new number, you have finished with the matches
in that document, and moved on to the next matching document.

the key to all of this is that if you want to find docs matching "+FOO
+BAR" and then find where exactly the instances of FOO and BAR are you can
do one query to get the matching docIds in whatever order you want, then
use seperate SpamTermQUeries to find where exactly each term is.


: Date: Thu, 18 Jan 2007 16:06:57 -0500
: From: Erick Erickson <erickerickson@gmail.com>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Counting hits in a document
:
: Hi again.
:
: I've been struggling for the last couple of days and getting nowhere, so
: it's time to swallow my pride and say "Help"....
:
: OK, let's say I have a document indexed and I do NOT have access to the raw
: text. I need to find the offset of all the hits for a query on a single
: document. Advice was offered a while ago to use getSpans from a spanquery,
: but for the life of me I don't see how to make this work. As I remember,
: Erik was talking about rewriting the original query as a set of spans.
:
: The trouble I'm having is that I sure don't see how to rewrite the standard
: query as a span query, then feed that back into my index for a particular
: document (that I have a unique ID for). It seems that the getSpans looks
: through my entire index, which is *probably* prohibitive.
:
: I can make each part of the query into a SpanTermQuery. I can assemble these
: together into a bunch of nested span queries. At the end of this, I have a
: single Span query that I can call getSpans on. But what now? I don't see how
: the spans relate to the document I need to focus on. From what I see of the
: Spans interface, it's intended to look at the entire index rather than be
: confined to a subset of the documents (in this case, exactly one.
: Guaranteed).
:
: I've thought about putting the documentID in a MUST clause of a
: BooleanQuery, and adding my span query to that, but it doesn't look like
: getSpans does me any good there.
:
: I looked at the SrndQuery family and don't see anything there that lets me
: get the offsets of my matches.
:
: I don't have the text, so I can't highlight all the hits and count.
:
: The code I've been writing feels like the wrong solution to the wrong
: problem at the wrong time. Given that I know the document ID on the way in,
: is my best bet to roll my own? That is, enumerate the relevant terms in my
: document and measure the distance between the terms and aggregate the
: results myself? I'd rather not do that, of course, but can if necessary.
:
: I *want* someone to say "just call <fill in magic method here>"....
:
: Any help greatly appreciated...
:
: Thanks
: Erick
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message