> Only terms returned from the Analyzer are considered, so if a stop
> word is removed it does not count for tf or idf.
But I need to compare according to non indexed words also. By the way,
goole does this.
> This will happen automatically with PhraseQuery with a slop factor.
> The closer the words, the better the score. However, with a pure
> boolean query, proximity is not considered at all (nor should it
> be). You can use a large slop factor for phrases such as "quick
> fox"~100 and see how the scores work then.
This means that all words must be in the result. This is not always
the case in my application. If I am searching for "quick brown fox",
"quick fox" is an acceptable result.
I just need to know whether I need to resort the search results
according to my criteria, or there are some methods to override which
will bring results already sorted.
On 7/22/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>
> On Jul 22, 2005, at 9:59 AM, Ahmed El-dawy wrote:
>
> > Hello,
> > I am using lucene to search plain text, but the order of the search
> > results is not satisfying to my needs. First, I want to know how the
> > similarity works. Then, I need to extend it.
>
> Use IndexSearcher.explain() to see how each individual hit is scored
> against a Query - this will be the clearest way to see why things
> score the way they do.
>
> > First, does the similarity class work on analyzed text or original
> > search text? To be precise, does it count the stop words as found
> > terms or not?
>
> Only terms returned from the Analyzer are considered, so if a stop
> word is removed it does not count for tf or idf.
>
> > Second, I want to add a factor of how relative are the terms of the
> > query found in text. For example, when I search for "quick fox", "fox
> > quick" and "quick brown fox" will be less ranked than "quick fox".
>
> This will happen automatically with PhraseQuery with a slop factor.
> The closer the words, the better the score. However, with a pure
> boolean query, proximity is not considered at all (nor should it
> be). You can use a large slop factor for phrases such as "quick
> fox"~100 and see how the scores work then.
>
> Erik
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>
--
Regards,
Ahmed Saad
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org
|