lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll" <GSIng...@syr.edu>
Subject Term highlighting and Term vector patch
Date Wed, 15 Sep 2004 22:20:36 GMT
Hi,

I was browsing the term highlighting code in the sandbox and I noticed
the following comment for the getBestFragment method in the
Highlighter.java code:

	/**
...
	 * @param tokenStream   a stream of tokens identified in the
text parameter, including offset information. 
	 * This is typically produced by an analyzer re-parsing a
document's 
	 * text. Some work may be done on retrieving TokenStreams more
efficently 
	 * by adding support for storing original text position data in
the Lucene
	 * index but this support is not currently available (as of
Lucene 1.4 rc2).  
...
	 */

which struck me that I might be able to contribute some more time to
make this so, since I recently submitted a patch to offer just such an
enhancement to the term vector.

I would like to implement this, but I don't really want to submit a
patch against another patch (It's hard enough managing all the changes
that come down).  So, I was wondering if anyone (i.e. a committer) has
had a chance to look at the Term Vector offset patch and what their
thoughts are on it?  I can see the performance improvements in the
highlighter that would come about by avoiding having to re-analyze the
text, plus you could highlight the whole field if you wanted to.

Also, if I make this change, do the committers suggest I keep the
current ability to analyze and have this as an alternative, or would it
be safe to assume this is only used when offset info is stored?

Thanks,
Grant


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message