lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Whelan <phil...@gmail.com>
Subject Re: Weird behaviour
Date Sun, 02 Aug 2009 17:52:14 GMT
Hi Prashant,

I agree with Shai, that using Luke and printing out what the Document
looks like before it goes into the index, are going to be your best
bet for debugging this problem.

The problem you're having is that StandardAnalyzer does not break-up
the hostname into separate terms, as it has a special case for
hostnames and acronyms.

This should work...
+title:"rahul dravid" +url:"en.wikipedia.org"

Thanks,
Phil

On Sun, Aug 2, 2009 at 10:14 AM, prashant
ullegaddi<prashullegaddi@gmail.com> wrote:
> Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and there is
> a document relevant to this query as well.
> The following query and its results proves it:
>
> Enter query:
> Searching for: +title:"rahul dravid" +url:wiki
> 4 total matching documents
>   trec-id: clueweb09-enwp02-13-14368, URL:
> http://en.wikipedia.org/wiki/Rahul_Dravid
>   trec-id: clueweb09-enwp01-83-11378, URL:
> http://en.wikipedia.org/wiki/Rahul_S_Dravid
>   trec-id: clueweb09-en0011-08-22737, URL:
> http://www.reference.com/browse/wiki/Rahul_Dravid
>   trec-id: clueweb09-enwp01-69-13556, URL:
> http://en.wikipedia.org/wiki/Rahul_Sharad_Dravid
> Press (q)uit or enter number to jump to a page.
>
> But see following query:
>
> Enter query:
> +title:"rahul dravid" +url:"wikipedia"
> Searching for: +title:"rahul dravid" +url:wikipedia
> 0 total matching documents
> Press (q)uit or enter number to jump to a page.
>
> Isn't it weird?
>
> -- Prashant.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message