lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Weird behaviour
Date Sun, 02 Aug 2009 10:37:37 GMT
You write that you index the string under the "url" field. Do you also index
it under "title"? If not, that can explain why title:"Rahul Dravid" does not
work for you.

Also, did you try to look at the index w/ Luke? It will show you what are
the terms in the index.

Another thing which is always good to debug such things is to create a
StandardAnalyzer, then request a tokenStream() from it, passing a
StringReader w/ the text you want to parse. Then just print the tokens

I've done that, using the version from trunk, w/ Version.2_4, and the tokens
that are extracted are:

1) You don't get results for title:"Rahul Dravid" since you index it under
"url" and not "title".
2) url:"wiki/Rahul_Dravid" works, since it looks for a phrase that exists in
the index (look at the last 3 tokens produced by the Analyzer, in the output
3) ur:"<entire string" also works, since you index all of it under the "url"

Does this explain the behavior you see?


On Sun, Aug 2, 2009 at 1:27 PM, prashant ullegaddi <
> wrote:

> Hi,
> I've indexed some 50million documents. I've indexed the target URL of each
> document as "url" field by using
> StandardAnalyzer with index.ANALYZED. Suppose, there is a wikipedia page
> with title:"Rahul Dravid" and
> url:
> But when I search for +title:"Rahul Dravid" +url:"Wikipedia", I'm getting
> no
> results. I get the document(s) when
> I search for url: or url:"
>". I get
> results even when I search for url:"wiki/Rahul_Dravid".
> It'd be helpful if somebody can throw some light on this.
> -- Prashant.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message