lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject Re: About PrefixQuery
Date Sun, 09 Mar 2003 09:46:10 GMT
Gentlefolk,


>Tatu, this message can be out of your topic, but i want just tell you that i successifully
made all the changes to make Lucene working again for highlight purpose, i can collect all
the terms from all the query; Range, Fuzzy, MultiTerm, etc. As you pointed out PhraseQuery
and RangeQuery don't extends MultiTermQuery and i agree they should, also i collect the actually
terms found after search, no the "raw" terms, but i think the Query has a private field that
holds this value. I added a method to Query: public abstract Term[] getTerms(); to make the
code nicer, so i just call this method to get the array without test what instanceof the current
query is.
>To collect the term for a signle document, i think it will be useful but not as much as
get the termpositions in a document.
>What's your idea for this proposal? can i help?
>Bye.
>--
>
>On Sat, 8 Mar 2003 20:16:08  
> Tatu Saloranta wrote:
>>I started looking to implement a term collector (object that can get all the
>>terms for given query), and have couple of questions for someone more
>>familiar with code base:
>>
>>(1) PrefixQuery does not extend MultiTermQuery (unlike FuzzyQuery and
>>  WildcardQuery). Is there some specific reason for this, or was it just
>> implemented before other two (perhaps MultiTermQuery was added after
>> PrefixQuery?).
>>(2) RangeQuery does not extend MultiTermQuery either, should it?
>>(3) Like Doug pointed out, there are two distinct use cases; caller may want
>>  to get either 'raw' terms (unprocessed terms, like "foo?ar") or terms  
>>  actually contained in any documents indexed ("foo?ar" might expand to
>>  "foobar",  "foogar" etc). However, there's also third interesting use case;
>>  getting terms in a specific document. Is it likely implementing this might
>>  be (relatively) easy? I'm not familiar enough with IndexReader to know for
> >  sure.

You'll need a termDocs() from an IndexReader to do that, and that means you'll
be accessing not only the terms, but also the part of the index that
gives the term frequencies in each document. You'll then have to check
whether your document shows up in these TermDocs (I hope I recall the API names
correctly).
Since this extends the scope of term expansion to query search, I think
you might consider not implementing it for term expansion.
It means that the implementation might become an order of magnitude slower
due to the need to access other parts of the index.

Kind regards,
Ype Kingma

-- 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message