lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <>
Subject Re: Getting tokens from search results. Simple concept
Date Sat, 07 Mar 2009 02:45:15 GMT
On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote:

> : What I would LOVE is if I could do it in a standard Lucene search  
> like I
> : mentioned earlier.
> : Hit.doc[0].getHitTokenList() :confused:
> : Something like this...
> The Query/Scorer APIs don't provide any mechanism for information like
> that to be conveyed back up the call chain -- mainly because it's more
> heavy weight then most people need.
> If you have custom Query/Scorer implementations, you can keep track of
> whatever state you want when executing a QUery -- in fact the  
> SpanQuery
> family of queries do keep track of exactly the type of info you seem  
> to
> want, and after executing a query, you can ask it for the "Spans" of  
> any
> matching document -- the down side is the a loss in performance of  
> query
> execution (because it takes time/memory to keep track of all the  
> matches)

Even then, if I'm not mistaken, spans track token _positions_, not  
_offsets_ in the original string.

A reverse text index like lucene is fast precisely because it doesn't  
have to keep track of this information.  I think the best alternative  
might be to use termvectors, which are essentially a cache of the  
analyzed tokens for a document.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message