lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Counting hits in a document
Date Thu, 18 Jan 2007 21:06:57 GMT
Hi again.

I've been struggling for the last couple of days and getting nowhere, so
it's time to swallow my pride and say "Help"....

OK, let's say I have a document indexed and I do NOT have access to the raw
text. I need to find the offset of all the hits for a query on a single
document. Advice was offered a while ago to use getSpans from a spanquery,
but for the life of me I don't see how to make this work. As I remember,
Erik was talking about rewriting the original query as a set of spans.

The trouble I'm having is that I sure don't see how to rewrite the standard
query as a span query, then feed that back into my index for a particular
document (that I have a unique ID for). It seems that the getSpans looks
through my entire index, which is *probably* prohibitive.

I can make each part of the query into a SpanTermQuery. I can assemble these
together into a bunch of nested span queries. At the end of this, I have a
single Span query that I can call getSpans on. But what now? I don't see how
the spans relate to the document I need to focus on. From what I see of the
Spans interface, it's intended to look at the entire index rather than be
confined to a subset of the documents (in this case, exactly one.

I've thought about putting the documentID in a MUST clause of a
BooleanQuery, and adding my span query to that, but it doesn't look like
getSpans does me any good there.

I looked at the SrndQuery family and don't see anything there that lets me
get the offsets of my matches.

I don't have the text, so I can't highlight all the hits and count.

The code I've been writing feels like the wrong solution to the wrong
problem at the wrong time. Given that I know the document ID on the way in,
is my best bet to roll my own? That is, enumerate the relevant terms in my
document and measure the distance between the terms and aggregate the
results myself? I'd rather not do that, of course, but can if necessary.

I *want* someone to say "just call <fill in magic method here>"....

Any help greatly appreciated...


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message