lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <kor...@lycos.com>
Subject Re: Query Term Collector (was: Re: New highlighter package available)
Date Fri, 03 Oct 2003 23:18:57 GMT
Hi Otis,
i'll try explain my code.
My idea is to have a common method inside each implementation of Query classes
-BooleanQuery
-PhraseQuery
-Every Other Class,

called "getTerms()"
what it does is return an array of Term.
eg: Term[] queryTerms = query.getTerms();

Please note that those terms are actually TERMS FOUND AFTER RUNNING a search. 
It is up to Highlight implementation decide to *SKIP* any term part of a NOT Clause, actually
what will you highlight if you want them to NOT be in your document.
But this is a particular case, it is out of the scope of Term Collector.

How these Terms are populated?
That is what i am asking to be part of the core.
It depends, it may change from ont type of Query to another one.
For example for TermQuery, i added this method:

public Term[] getTerms()
	{
		Term[] terms = new Term[1];
		terms[0] = getTerm();
		return terms;
	}
a simple way to make it compatible with all the other classes that can return more than one
Term.
Another change is make an abstract method inside Query.java, so that any other Query class
that extends this one must implement this method. 
For MultiTermQuery for example:

do {
   Term t = enum.term();
   if (t != null) {
   TermQuery tq = new TermQuery(t); // found a match
    tq.setBoost(getBoost() * enum.difference()); 
    query.add(tq, false, false); // add to query
if( collectTerm() ){
 addTerm(term);  // NEW: add term in the term array
}
   }
}while (enum.next());

of course the class has:
protected abstract void addTerm(Term t);
we implement this in final class, eg. FuzzyQuery:
...
private ArrayList terms = new ArrayList();
...
protected void addTerm(Term t)
{
   terms.add(t);
}

and of course
...getTerms(){...} 

The last thing is avoid waste of resources for users that do not want to collect terms, so
a final variable QueryParser.COLLECT_TERM can be set after construct your query, as it happens
for DEFAULT_OPERATOR, collectTerm() will return true in case the user wants to collect terms.

Do you have a main picture now Otis?
ciao,
Korfut


--

--------- Original Message ---------

DATE: Thu, 2 Oct 2003 14:15:05 
From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
To: Lucene Developers List <lucene-dev@jakarta.apache.org>, korfut@lycos.com
Cc: 

>Korfut, Tatu (if you're watching),
>
>I'm trying to understand what this term collector idea is all about, so
>I looked online for some of your previous discussions on this topic
>from March 2003.  So this patches that both of you sent to lucene-dev
>at some point both implement a term collector.
>What terms do your term collectors collect, could you explain that in
>simple terms, and with an example, please? (I almost broke one of the
>walls in my apartment, when I accidentally smacked it with my head 10
>minutes ago)
>
>If I make a BooleanQuery: Laurel AND Hardy
>What is I make a WildcardQuery: Comed*
>What terms would your collector collect and return in each case?
>
>I only saw Tatu's diff to the existing classes, and noticed that his
>solution includes 5-6 new classes.
>(http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02815/querySrc.zip)
>
>Thanks,
>Otis
>
>
>
>
>--- none none <korfut@lycos.com> wrote:
>> Hi Otis,
>> as Tatu' explained (sorry i am pretty busy at work,  thank you
>> Tatu'!)
>> we only ask for "Support of Term Collector" and this needs some
>> changes in the core, changes are in a previous email i sent to the
>> list (can do it again), it is like a patch, doing that it will be
>> easier *for us* to provide highlight when a new version of Lucene
>> comes out.
>> As for Mark works of the highlighter, it is not working with release
>> 1.3, due to big changes in the core, query rewrite, termenum, etc.
>> As tatu said, there can be a waste of resource for users that do not
>> need term collector, so a boolean value will avoid that, by default
>> we can set it to TERM_COLLECTOR_OFF. 
>> I had to go through all the lucene code (almost) to make it work in
>> 1.3.
>> that's all.
>> thanks.
>> 
>> Korfut.
>
>
>
>__________________________________
>Do you Yahoo!?
>The New Yahoo! Shopping - with improved product search
>http://shopping.yahoo.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>
>



____________________________________________________________
Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!
http://login.mail.lycos.com/r/referral?aid=27005

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message