lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <gsing...@syr.edu>
Subject Re: Dmitry's Term Vector stuff, plus some
Date Tue, 24 Feb 2004 23:06:55 GMT
This is provided by the Token.startOffset() and Token.endOffset() at indexing time, I think.

I don't know if this is accessible at run time.  A good place to see what is stored in the
files is the File Formats section located at http://jakarta.apache.org/lucene/docs/fileformats.html.
 (Get the latest from HEAD to see the new Term Vector stuff).  For what you can access, I
usually start at IndexReader and dig in from there.

Of course, the Position info and how we did it is available in the "first" patch I submitted
(and the "original one" from Dmitry), so if you are willing to always write position information,
you could update your code with  that information.  Or, better yet :-), take it and add the
necessary touches to make it truly optional and donate it back to Lucene.

-Grant


>>> bruce@jivesoftware.com 02/24/04 05:39PM >>>
Grant Ingersoll wrote:

> It is the location of the token in the document (see IndexReader.termPositions()).  
> This information is already being stored in other parts of the index, it just isn't very
efficient to get at it.  

Ok, that wasn't the answer I was hoping for :) I was hoping that the positions referred to
was the 
start/end offsets in the originating Token(s). I'll just have to find another way to optimize
the 
highlighting code to make it more efficient.


Regards,

Bruce Ritchie
http://www.jivesoftware.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message