lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Counting hits in a document
Date Thu, 18 Jan 2007 23:15:19 GMT
Just threw together a highlighter that can handle spans (combining a 
rewrite with dumspans from LIA) and used this:

Nice spans extractor from Mark (not me <G>). Give it a query it will 
give you the spans.

- Mark

Erick Erickson wrote:
> Hi again.
> I've been struggling for the last couple of days and getting nowhere, so
> it's time to swallow my pride and say "Help"....
> OK, let's say I have a document indexed and I do NOT have access to 
> the raw
> text. I need to find the offset of all the hits for a query on a single
> document. Advice was offered a while ago to use getSpans from a 
> spanquery,
> but for the life of me I don't see how to make this work. As I remember,
> Erik was talking about rewriting the original query as a set of spans.
> The trouble I'm having is that I sure don't see how to rewrite the 
> standard
> query as a span query, then feed that back into my index for a particular
> document (that I have a unique ID for). It seems that the getSpans looks
> through my entire index, which is *probably* prohibitive.
> I can make each part of the query into a SpanTermQuery. I can assemble 
> these
> together into a bunch of nested span queries. At the end of this, I 
> have a
> single Span query that I can call getSpans on. But what now? I don't 
> see how
> the spans relate to the document I need to focus on. From what I see 
> of the
> Spans interface, it's intended to look at the entire index rather than be
> confined to a subset of the documents (in this case, exactly one.
> Guaranteed).
> I've thought about putting the documentID in a MUST clause of a
> BooleanQuery, and adding my span query to that, but it doesn't look like
> getSpans does me any good there.
> I looked at the SrndQuery family and don't see anything there that 
> lets me
> get the offsets of my matches.
> I don't have the text, so I can't highlight all the hits and count.
> The code I've been writing feels like the wrong solution to the wrong
> problem at the wrong time. Given that I know the document ID on the 
> way in,
> is my best bet to roll my own? That is, enumerate the relevant terms 
> in my
> document and measure the distance between the terms and aggregate the
> results myself? I'd rather not do that, of course, but can if necessary.
> I *want* someone to say "just call <fill in magic method here>"....
> Any help greatly appreciated...
> Thanks
> Erick

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message