lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <kor...@lycos.com>
Subject Re: Lucene Highlighter
Date Mon, 03 Mar 2003 07:19:32 GMT
hi,
after digging a little bit the code i came up with some questions due to make the highlighter
working with the future release of Lucene (1.3).
The questions are:

- why phrase uses a Vector and PhrasePrefix an ArrayList? just curious.

- Is it possible add a method "public Term[] getTermsArray()" that will return the "termArrays"
from the PhrasePrefixQuery? Is it still populated after we run the search?

- Is it possible have a PhrasePrefixQuery of 2+ terms? e.g.: "Microsoft Soft* Windo*" ? why
are there 2 methods, one to add a single term another one to add more than one term? is the
termsArray an array on term's array ?

- Is it correct that PrefixQuery.rewrite(...) is called by the searcher (reader?) at search
time to have a BooleanQuery with "OR" condition between each clause? each clause holds a termquery?

- PrefixQuery > what do you think of this scenario: user set "populateTermArray()" before
run the search, we set a static variable inside the Query class so the setting is reflected
to all the XxxQuery classes, in the 'rewrite' method we check this value and if true (default
false) we store each term in an array 'termsArray' one for each implementation (wildcard,
etc), then when we need to highlight we call getTermsArray() for each query based on the instance
type (again: wildcard, etc), then we set the array to null or wait for the garbage collector
to release this resource. sounds good??

- PrefixQuery and other query classes that has this method 'rewrite'>> can the method
be called more than once at search time? if so we should hold the privious array of terms
and add to it the new terms without duplicates.

- RangeQuery >> can we apply the same criteria as for the PrefixQuery?

- All the classes that extends MultiTermQuery >> can we apply the same criteria as for
PrefixQuery? (as above, just add a vector that holds the terms, if the user wants to, and
get this array when highlighting, may call a method to release the resource after we are done
with the highlight)

- how it is possible get the term position of a particular term in a particular document in
the index? this will improve a lot the process to get start and end offset of a term in a
document. i assume that a text version of the field to highlight is available, e.g.: the content
of an html page is a field and is stored in a single text file. Also would make it compatible
with the tokenizer as we will use the same we did at indexing time, avoid to write a pattern
for each criteria in the RegExp (actually it will not be necessary anymore!)

- would all these changes make slower the search process? as a guess, how much?

- would the termposition call be slow?
 
Thank you guys.


_____________________________________________________________
Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message