lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prashant ullegaddi <prashullega...@gmail.com>
Subject Re: Weird behaviour
Date Sun, 02 Aug 2009 17:58:33 GMT
Hi Phil,

The query you gave did work. Well, that proves StandardAnalyzer has a
different way
of tokenizing URLs.

Thanks,
Prashant.

On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan <phil123@gmail.com> wrote:

> Hi Prashant,
>
> I agree with Shai, that using Luke and printing out what the Document
> looks like before it goes into the index, are going to be your best
> bet for debugging this problem.
>
> The problem you're having is that StandardAnalyzer does not break-up
> the hostname into separate terms, as it has a special case for
> hostnames and acronyms.
>
> This should work...
> +title:"rahul dravid" +url:"en.wikipedia.org"
>
> Thanks,
> Phil
>
> On Sun, Aug 2, 2009 at 10:14 AM, prashant
> ullegaddi<prashullegaddi@gmail.com> wrote:
> > Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and there
> is
> > a document relevant to this query as well.
> > The following query and its results proves it:
> >
> > Enter query:
> > Searching for: +title:"rahul dravid" +url:wiki
> > 4 total matching documents
> >   trec-id: clueweb09-enwp02-13-14368, URL:
> > http://en.wikipedia.org/wiki/Rahul_Dravid
> >   trec-id: clueweb09-enwp01-83-11378, URL:
> > http://en.wikipedia.org/wiki/Rahul_S_Dravid
> >   trec-id: clueweb09-en0011-08-22737, URL:
> > http://www.reference.com/browse/wiki/Rahul_Dravid
> >   trec-id: clueweb09-enwp01-69-13556, URL:
> > http://en.wikipedia.org/wiki/Rahul_Sharad_Dravid
> > Press (q)uit or enter number to jump to a page.
> >
> > But see following query:
> >
> > Enter query:
> > +title:"rahul dravid" +url:"wikipedia"
> > Searching for: +title:"rahul dravid" +url:wikipedia
> > 0 total matching documents
> > Press (q)uit or enter number to jump to a page.
> >
> > Isn't it weird?
> >
> > -- Prashant.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message